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We characterize conjugate nonparametric Bayesian models as pro- 
jective limits of conjugate, finite-dimensional Bayesian models. In 
particular, we identify a large class of nonparametric models repre- 
sentable as infinite-dimensional analogues of exponential family dis- 
tributions and their canonical conjugate priors. This class contains 
most models studied in the literature, including Dirichlet processes 
and Gaussian process regression models. To derive these results, we 
introduce a representation of infinite-dimensional Bayesian models by 
projective limits of regular conditional probabilities. We show under 
which conditions the nonparametric model itself, its sufficient statis- 
tics, and - if they exist - conjugate updates of the posterior are pro- 
jective limits of their respective finite-dimensional counterparts. The 
results are illustrated both by application to existing nonparametric 
models and by construction of a model on infinite permutations. 

1. Introduction. Nonparametric Bayesian statistics effectively revolves 
around a small number of fundamental models, including the Dirichlet pro- 
cess [16], Gaussian process [50, 56], beta process [25] and gamma process [16]. 
All these models have conjugate posteriors [55]. Since most nonparametric 
Bayesian models are derived from such fundamental, conjugate models, vir- 
tually all nonparametric Bayesian inference is based directly or indirectly 
on conjugacy. The objective of this work is to study the shared properties 
of fundamental models and to characterize the class of models admitting 
conjugate posteriors. 

By nonparametric Bayesian model, we refer to a Bayesian model on an 
infinite-dimensional parameter space [21, 26, 55]. We do not a priori dis- 
tinguish between discrete models (e.g. Dirichlet processes) and continuous 
models (e.g. Gaussian process regression). In addition to conjugacy, models 
such as the Gaussian and Dirichlet processes share another property, the 
existence of marginals in the exponential family. In the case of the Dirichlet 
process, there is a well-known connection between the two properties: Con- 
jugacy of the nonparametric model can be derived directly from the conju- 
gacy of the marginal, finite-dimensional Dirichlet priors [20]. We will show 
in the following how the vague but intuitively appealing link between con- 
jugate posteriors and exponential family marginals in general nonparamet- 
ric Bayesian models can be made precise. If an infinite-dimensional model 
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is constructed from finite-dimensional marginal distributions, conjugacy of 
the marginals proves sufficient to guarantee a conjugate posterior of the 
nonparametric model. 

The analysis of shared properties of models requires a shared representa- 
tion, which leads almost inevitably to projective limits, i.e. the representa- 
tion of a stochastic process by its finite-dimensional marginal distributions 
[9]. Most representations used in Bayesian nonparametrics are adapted to 
specific models - examples include Levy processes, stick-breaking construc- 
tions [53], transformed Poisson processes [17], and normalized completely 
random measures [27]. The advantages of such model-specific representations 
are that they emphasize useful properties of the model in question, as well 
as their simplicity - more general representations tend to come at the price 
of more technical subtleties involved in their application. Possible choices for 
more general representations of probability measures are densities, charac- 
teristic functions and projective limits. Densities are not applicable for non- 
parametric Bayesian models, both for lack of a suitable translation-invariant 
carrier measure on infinite-dimensional space, and because some important 
models (such as the Dirichlet process) are not dominated [51]. Characteristic 
functions are ill-suited for the questions considered here, since they do not 
live on the actual sample space. 

A projective limit (also called an inverse limit) assembles an infinite- 
dimensional mathematical object from a family of finite-dimensional ob- 
jects [7-9]. Projective limits of probability measures, i.e. Kolmogorov's ex- 
tension theorem and its generalizations, are widely used in the construc- 
tion of stochastic processes: A stochastic process with paths in an infinite- 
dimensional space is represented in terms of its finite-dimensional marginals 
[28] . Since a projective limit representation is not sufficient to specify some 
important properties of sample paths, such as continuity of random func- 
tions or cj-additivity of random measures, we combine projective limits with 
the notion of a pullback under a suitable transformation mapping [19]. The 
pullback accounts for those almost sure properties of paths not expressible 
in terms of the projective limit. 

Projective limits can be defined not only for measures, but also for sets, 
functions, and a wide variety of mathematical structures [7-9, 38]. This al- 
lows us to both define projective limits of conditional probabilities, and to 
apply the representation to sufficient statistics and other functions associ- 
ated with a model. In this manner, we obtain a representation of a nonpara- 
metric Bayesian model in terms of a family of finite-dimensional "marginal" 
Bayesian models. The properties of the nonparametric model can be related 
directly to those of the parametric marginals. Application to the questions 
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of sufficiency and conjugacy shows that both the sufficient statistics and 
the posterior updates of a nonparametric Bayesian model can be expressed 
in terms of their finite-dimensional counterparts. This result in particular 
establishes a large family of models - containing both the Gaussian and the 
Dirichlet process - which can be regarded as a nonparametric analogue of the 
exponential family, in a sense to be made precise in the ensuing discussion. 

The results imply an approach to the construction from scratch of non- 
parametric Bayesian models on a wide range of domains. In this regard, an 
additional appeal of projective limits is the large number of such representa- 
tions available in the mathematical literature, each of which may potentially 
be harvested for the purposes of Bayesian nonpar ametrics. Examples include 
the projective limit/pullback construction of continuous functions used in 
the construction of the Gaussian process [e.g. 2]; a variety of constructions 
of topological and algebraic objects discussed by Bourbaki [7, 8, 9]; the con- 
struction of random coagulation and fragmentation processes [4]; and recent 
constructions of infinite limits of permutations by Kerov et al. [31], and of 
graph limits by Lovasz and Szegedy [41]. 

1.1. Summary of Results. Since projective limits are, by themselves, not 
capable of expressing all properties of stochastic processes such as the Dirich- 
let and Gaussian process, additional steps are required to obtain an applica- 
ble distribution. These steps and their formalization in the literature differ 
widely between models. Since our problem requires a unified formalism, we 
derive a representation in terms of a pullback of the projective limit under 
a measurable embedding. Intuitively, the stochastic process of interest is 
represented by uniquely encoding each of its paths as a path of the projec- 
tive limit process. The resulting representation is applicable to all important 
nonparametric Bayesian models. 

Projective limits and pullbacks preserve a variety of properties of functions 
and set functions. For example, projective limits and pullbacks obtained 
from injective functions are again injective functions. The same holds for 
continuous and measurable mappings, bijections, probability measures and 
regular conditional probabilities. Some of these facts are standard results, 
others are established in the following. In particular, we show: 

(1) The countable projective limit of a projective family of probability 
kernels (regular conditional probabilities) on finite-dimensional spaces 
is a probability kernel on an infinite-dimensional space. The extension 
theorems of Kolmogorov and of Prokhorov can both be generalized 
along these lines (Theorem 1; Corollary 1). Similarly, the pullback of 
a probability kernel is again a probability kernel (Proposition 1). 
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A Bayesian model is defined by conditional probabilities. By application of 
the previous results to these conditionals, we obtain: 

(2) A projective limit can be applied directly to finite-dimensional Bayesian 
models, resulting in infinite-dimensional Bayesian models on the corre- 
sponding projective limit spaces (Sec. 4.2). Fullbacks also preserve the 
structure of the Bayesian model (Sec. 4.3). Both operations commute 
with the computation of posteriors (Diagram (4.3)). 

In other words, nonparametric Bayesian models can be directly constructed 
from finite-dimensional "marginal" Bayesian models. The construction is 
analogous to the construction of stochastic process measures by means of 
projective limits and pullbacks. 

Since projective limits and pullbacks are applicable to measurable func- 
tions, they apply simultaneously to a model and its associated statistics. 

(3) The projective limit of the sufficient statistics (resp. sufficient cr-algebras) 
of the marginal models is a sufficient statistic (resp. sufficient cr-algebra) 
of the infinite-dimensional projective limit model (Sec. 5). We also 
show that, if the sufficient cj-algebras of the marginals are minimal, 
the projective limit fi-algebra is again minimal sufficient. This holds 
even if the projective limit model is undominated (Proposition 3). 

The practical utility of conjugate Bayesian models is due to the repre- 
sentability of their posterior parameters as functions of the data and the 
model hyperparameters. We show that the structure and functional form of 
this update process carries over from the marginals to the nonparametric 
model. 

(4) Projective limits and pullbacks of conjugate Bayesian models are con- 
jugate, and in particular, the mapping to the posterior parameter 
of the infinite-dimensional model is the projective limit of the up- 
date mappings of the marginal models (Sec. 6). For the specific case 
in which the finite-dimensional marginals are conjugate exponential 
family models, we obtain a nonparametric analogue of the Diaconis- 
Ylvisaker representation [14] of conjugate parametric models (Corol- 
lary 2). 

The results are illustrated by application to three concrete examples: Gaus- 
sian processes (Examples 2 and 3), Dirichlet processes (Example 1 and 
Sec. 7.1), and a Bayesian model on infinite permutations (Sec. 7.2). 

1.2. Related Work. The application of projective limits to statistical 
models was pioneered by Lauritzen [39, 40], to derive a family of parametric 
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models which are defined by sequences (rather than averages) of sufficient 
statistics and generahze beyond exchangeable observations. In Lauritzen's 
work, the "dimensions" of the projective limit describe repeated observations 
from a parametric model, rather than dimensions of sample and parameter 
space as in our case. Nonetheless, if n observations in Lauritzen's "projec- 
tive statistical fields" [40, Chapter IV] are interpreted as a sample of size n 
in a Bayesian nonparametric model, the projective limit aspects of Sec. 3 
below can be regarded as an analogue of Lauritzen's projective fields for 
application to nonparametric Bayesian models. 

Conjugate analysis in the finite-dimensional, parametric case, i.e. for dom- 
inated models, is the subject of a substantial literature [e.g. 12-14]. Bernardo 
and Smith [3] give a concise overview. It is also well known that almost all 
nonparametric Bayesian models are conjugate [55]; if the model is undom- 
inated, Bayes' theorem is not applicable, and conjugacy is often the only 
way to represent the posterior. Other models indirectly rely on conjugacy: 
The popular Dirichlet process mixture model [1, Example 4] does not have 
a conjugate posterior, but is amenable to Gibbs sampling only because the 
Dirichlet process law of the mixing measure is conjugate. However, conjugacy 
of nonparametric Bayesian models has not so far been analyzed as a struc- 
tural property, with one notable exception: In the special case of sequential 
independent increment processes, for which a class of models with exponen- 
tial family marginals is discussed in detail by Kiichler and S0rensen [35], 
the existence of conjugate posteriors is studied by Magiera and Wilczyhski 
[42]. Thibeaux and Jordan [54] draw on a similar insight and invoke a con- 
jugacy argument to relate the Indian buffet process model of Griffiths and 
Ghahramani [22] to the beta process of Hjort [25]. 

1.3. Outline. We develop a representation of stochastic processes suit- 
able for our purposes in Sec. 2. Projective limits and pullbacks are then 
applied to conditional probabilities in Sec. 3, which facilitates their applica- 
tion to Bayesian models in Sec. 4. From the representation of nonparametric 
Bayesian models so obtained, we derive results on their sufficient statistics 
in Sec. 5, and on conjugate posteriors in Sec. 6. Two detailed examples in 
Sec. 7 illustrate the approach and results. Since projective limits of func- 
tions and pullbacks of measures are not commonly used in statistics, a brief 
summary of relevant facts is provided in Appendix A. 

1.4. Notation and Assumptions. All random variables are in the follow- 
ing assumed to share an abstract probability space {Q,A,¥) as common 
domain. We will frequently have to distinguish spaces of different dimen- 
sions, which are indexed by subscripts as Afj, Tj, etc. All mappings, cr-fields 
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and other quantities on these spaces are indexed accordingly. We use su- 

(7) 

perscripts xf to denote elements of sequences or repetitive observations. 
For any measure i^, a superscript v* indicates the corresponding outer mea- 
sure. Observations are generally assumed exchangeable. Topological spaces 
are assumed to be Polish spaces, i.e. complete, separable and metrizable 
spaces, unless expressly stated otherwise. We refer to a measurable space 
as standard Borel if it is the Borel space generated by a Polish topology. 
As the underlying spaces are Polish, all conditional probabilities P[X|C] are 
assumed to be regular conditional probabilities (probability kernels). 

2. Construction of Stochastic Processes. We will briefly survey the 
construction of stochastic processes and introduce some relevant definitions. 
The presentation assumes familiarity with the terminology of projective lim- 
its, which is used here in the sense of Bourbaki [7, 8, 9]. A more detailed 
summary of projective limits and pullbacks is given in Appendix A. 

2.1. Projective Limit Notation. Let {D,<) be a partially ordered, di- 
rected set. We assume D to be countable throughout. Let {Xi,Bi, fji)^^.-^^^, 
or {Xi, Bi, fji)^ for short, be a projective system of topological measurable 
spaces indexed by D. That is, Xi are topological spaces, Bi their Borel a- 
algebras, and /ji : Afj — >• A'l are continuous generalized projections; the 
mappings are called generalized projections if they satisfy 

(2.1) /„ = Idxi and /ki = /kj o /ki whenever I < .J < K . 

Denote by Af^ the projective limit space. The mappings /ji induce a family 
of unique generalized projection mappings /j : — )• ^Yj. The space X^) 
is endowed with the smallest topology Top^ which makes all /i continuous. 
Topj3 is called the projective limit topology, and generates the projective limit 
Borel cr-algebra ^Bd- A family (-Pi)^ of probability measures on the spaces 
Xi is called projective if f,n{Pj) = -Pi whenever I < J . By the extension 
theorem of Kolmogorov and Bochner (App. A, Theorem 4), any projective 
family defines a unique probability measure Pq on {Xj^,Bj^) which satisfies 
Pi = /i(-fD) for all / G We refer to this measure, also denoted Pq = 
^im(P[)^, as the projective limit of (Pi)^, and to the measures Pi as the 
marginals of Pd- Intuitively, the measures Pi are probability distributions 
on finite-dimensional spaces, and Pd is a joint distribution of a stochastic 
process (^i) on the infinite-dimensional space X^^. 

The projective limit space Af^ is a subset of the product space H/eD'^i- 
If prj denotes the canonical projection onto Xi in the product space, the 
canonical mappings /: are the restrictions /i = prjl;^:'^. It is often useful 
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to regard the elements Xd of as functions : D ^ U/guA'i, or more 
precisely, as functions on D taking values x{I) G Xi. In the context of 
nonparametric Bayesian estimation, the indices I £ D may be thought of 
as covariates or sets of covariates and the function values Xi = x{I) as 
measurements, if Afo represents the observation space of the model. If X^ 
is a parameter space, continuous real-valued functions x^ may represent 
regressors, set functions x^ may represent density estimates, etc. 

2.2. Stochastic Processes. A stochastic process is in general a collec- 
tion (-'^i)^^ of random variables, indexed by an infinite set D. Hence, if 
Pd = ^im (-Pi)p is a projective limit measure with marginals Pi, the fam- 
ily of random variables distributed according to the measures Pi is a 
stochastic process indexed by D. Conversely, any stochastic process can in 
principle be regarded as the projective limit of its marginals on suitably cho- 
sen subspaces. However, constructions of stochastic processes as projective 
limits have to address two fundamental technical problems: 

(a) Uncountable index sets. An event A C Xj^ is measurable under Pq only 
if it depends on an at most countable subset D' C D of coordinates [e.g. 
5, Theorem 36.3]. In other words, unless D is countable, singletons are 
not measurable in the projective limit space, and the projective limit 
measure Pd is not useful for most applications. 

(b) Infinitary properties of sample paths. If the spaces Xi in the projective 
system are finite-dimensional, the projective limit construction can 
only express properties of the random functions Xd that are finitary, 
such as non- negativity or monotonicity of real- valued functions, or 
finite additivity of set functions. 

Problem (a) means, for example, that projective limits can directly define 
a useful measure on functions Q — )• R, but not on functions M — )• M, since 
the space M.* of all functions M — t- M has uncountable dimension. Problem 
(b) implies, for example, that a projective limit construction of random set 
functions can define a sample space consisting of all charges (finitely addi- 
tive probabilities), but not a sample space containing exactly all probability 
measures, which would require the projective limit to express countable ad- 
ditivity. 

Both problems (a) and (b) can be jointly addressed in an elegant manner 
by means of pullbacks under suitable functions. Given a space X, a measure 
space {yjBy,!^) and a function : X ^ y, the pullback of u under J' is 
the measure u on {X , J'~^By) satisfying i7(t') = i^. The pullback measure 
u is uniquely defined whenever the image J{X) C 3^ has full outer measure 
under u, that is if i>*{J'{X)) = I'iy) - see App. A. 2 for more details. The 
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most common example of a pullback is the restriction of a measure to a 
(possibly non-measurable) subspace, in which case C 3^ is an arbitrary 
subset and J' : X ^ y the canonical inclusion map. The a-algebra J'^By 
is then precisely the subspace cr-algebra By n X. Hence, if is a probability 
measure on y, and if the subspace has outer measure = 1, the pullback 

D exists and is the restriction of to {X, By n X). 

To construct stochastic processes, we will specifically consider pullbacks 
under embedding maps. Let cj) : X ^ X he a, mapping between topological 
spaces. Such a mapping is called an embedding if, regarded as a mapping 
onto its image, it is a homeomorphism. Analogously, we refer to (/> as a Borel 
embedding if it constitutes a Borel isomorphism of its domain and its image 
(r, B{X) n r). A definition of a stochastic process suitable for our questions 
in Bayesian nonparametrics is the following: 

Definition 1. Let {X ,B{X), P) be a topological measure space and 
{Xi,Bi,Pi)^ a projective system of standard Borel spaces with countable, 
directed index set D. Then P is called a countably representable stochastic 
process if it is the pullback of the projective limit measure Pu := l^m (Pi)^ 
under a Borel embedding (p : X ^ T C Xj^. 

To be asymptotically identifiable, a model can have at most a countable 
number of degrees of freedom, which motivates the restriction to sample 
paths of countable complexity implicit in Definition 1: The indices I G D 
of a projective limit can be thought of as dimensions or degrees of freedom. 
Hence, the sample space X a stochastic process with countably many 
degrees of freedom can be embedded into a suitably chosen projective limit 
space A'd with countable index set. 

The special case in which is a projective limit measure on an uncountable 
product space y := Af^, constructed from Euclidean spaces Xi = EJ, and 
X is e.g. the subset of continuous functions, is known in stochastic process 
theory as "Doob's separability theorem". In this case, the pullback P is 
called a "separable modification" of v [15]. The index set D is the set of all 
finite subsets of the "separant", a dense countable subset of M+. See also [5, 
Chapter 38]. 

The intuition that sample paths of P (the elements of X) are uniquely 
represented by their embeddings into Xj^ can be helpful in establishing that 
a given mapping (j) is indeed a Borel embedding: Suppose that a measurable 
map (p is given. As a mapping onto its image, it is trivially surjective, so 
what remains to be established for Borel isomorphy is the existence of a 
measurable inverse. If the elements of X are uniquely represented by their 
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embeddings, then (p is injective. In most settings, the mapping (p can be 
directly derived from a suitable representation result, such as the repre- 
sentation of continuous functions by their values on countable subsets as 
mentioned above, or the representation of measures by their values on a 
generating algebra of sets (by Caratheodory's extension theorem). If addi- 
tionally both X and F are standard Borel spaces, Borel isomorphy follows 
automatically, since measurable bijections between standard Borel spaces 
are bimeasurable [28, Theorem Al.3]. 

Example 1 (Dirichlet process). Suppose that P is a Dirichlet process 
DP (aGo) over a standard Borel space iV, By). The spaces Xi can be chosen 
as finite-dimensional simplices Ai C indexed by measurable partitions 
/ = (^1, . . . , ^|/|) of the space V . The marginals Pi{Xi) are Dirichlet distri- 
butions on the simplices. The projective limit is the space of all charges de- 
fined on a specific countable algebra Q C By which generates By ■ The space 
X is the space of all probability measures on By, and its image F = (^{X) 
is the set of probability measures on the subalgebra Q. For a given measure 
X on By, the image <j){x) is the restriction of x to Q. By the Caratheodory 
extension theorem, (f) is injective. Whether admits a pullback under (p 
depends on the parametrization of the marginals: If Gq is a charge on Q, 
and each Dirichlet marginal has parameter a ■ /i(Go) for some fixed a > 0, 
the Dirichlet distributions form a projective family. The projective limit sat- 
isfies i'D(F) = 1 if and only if Go is countably additive. Sec. 7.1 revisits this 
example in detail. 

Example 2 (Gaussian Process). To obtain a Gaussian process measure 
on the set X := C(]R+, M) of continuous functions M+ — t- M, a projective limit 
is constructed as follows: Choose D as the set of all finite subsets / of Q+, or- 
dered by inclusion, and define X^ := JJ^i^j M. Let /ji := pr ,j be the coordinate 
projections in Euclidean space, and (-Pi)^ a projective family of multivari- 
ate Gaussian distributions. The projective limit space is Xy) = M.^+ , and 
the projective limit measure Pu can be regarded as a discrete-time Gaussian 
process indexed by Q+. We embed X into Q+ by means of the restriction 
map 4> : x i— t- x\q_^_. The mapping (/> is a Borel isomorphism as required 
in Definition 1: As a canonical inclusion map, (p is continuous and hence 
measurable. Since the representation of x by its restriction is unique, (p is 
injective. The cr-algebra (p^^B^ induced by cp on C(M+,M) coincides with 
the Borel a-algebra generated by the topology of compact convergence [19, 
Section 4540]. Hence, X is standard Borel, and (p bimeasurable. The require- 
ment P^{X) = 1 for the existence of the pullback measure is not generally 
satisfied for arbitrary Gaussian marginals Pi. It can, however, be related 
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to the parameters of the marginals. A prototypical result is Kolmogorov's 
continuity theorem [2, Theorem 39.3]: If the expectation under P^, satisfies 
E[\Xi - Xjl"] < 7|i - for ah i,j £ Q+ and any fixed a, /3, 7 G M>o, then 
Pn(C(M+,M)) = 1. An example to the contrary is obtained for marginals 
satisfying Cov[Xi, Xj] = Sij. The resulting Gaussian white noise process is 
almost surely discontinuous, and hence P^{X) / 1. 

3. Projective Limits of Conditional Probabilities. In this section, 
we apply the projective limit approach to conditional probabilities. By means 
of Theorem 1 below, a conditional probability on an infinite-dimensional 
space can be assembled as a projective limit of conditional probabilities 
on finite-dimensional spaces, in a similar manner as a probability measure 
can be specified as a projective limit by means of the Kolmogorov-Bochner 
extension theorem. 

3.1. Construction Results. Let (X^, Bi, fji)^ be a projective system of 
standard Borel spaces. For each I £ D, let Pi[Xi|Ci] be a regular condi- 
tional probability on {Xi,Bi). More precisely, Xj : — t- is a random 
variable, Ci C .4 is a c-subalgebra on the abstract probability space fi, and 
Pi[. |Ci]( . ) : 5i X n — )■ [0, 1] is a probabihty kernel. 

The projections /ji immediately generalize from probability measures to 
conditional probabilities by means of 

(3.1) (/.„P,)[X., G . |C,] := PAX, G f-,' . \C,] . 

The projector acts only on the first argument of the probability kernel. 
To generalize the notion of a projective family, the second argument has 
to be taken into account as well: Consider a parametric family Pi[Xi|0i], 
i.e. each Ci is generated by a parameter random variable Bj. Typically, if 
0j parametrizes a high-dimensional random variable Xj and Gi a lower- 
dimensional variable X, , we would assume the information contained in Oi to 
be a subset of the information contained in 0j . The concept can be expressed 
in very general terms by assuming that the u-algebras Ci are ordered in 
accordance with the index set, i.e. Ci C Cj whenever / ^ J. In analogy to 
the index set, we refer to such an ordered family of cr-algebras as directed. 

Definition 2 (Projective family of conditional probabilities). Let (Ci)^ 
be a directed family of cj-algebras. A family (Pi[A'i|Ci])j^ of probability ker- 
nels on the the projective system (X,, Bi, f,n)^ is called projective if 
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Projectivity of conditionals is a stronger condition than projectivity of 
measures: We have P{A) = P[A\C]{uj)dF{cL!) for any C C A, and hence 
P,[f-^A,\C,] =..e. Pi[A|Ci](w) imphes Pj(/j7'A) = Pi{A,). Therefore, pro- 
jective conditionals imply projective measures, but the converse only holds 
under additional conditions (cf Lemma 2). If the conditional distributions of 
random variables Xi are projective given one directed family of a-algebras, 
the same may be not true for another family, so the conditional projector is 
effectively parametrized by the family (Ci)^. 

Theorem 1 (Projective limits of conditional probabilities). Let E he 
a countable directed set. Let (Pj[Xj|C/])^ he a projective family of prohahil- 
ity kernels on a projective system (Afj, Bj, fj,) of Polish measurable spaces. 
Then there exists a unique (up to equivalence) probability kernel, denoted 
Pd[- \C-d\, which satisfies 

(3.3) {fiPD)[. \Cu] =a.e. Pr[ . \Cj] for alllGD, 

and is measurable with respect to Co ■= o"(C/; I £ D). 

As the proof below shows, Pd[ • |Cd] can be regarded as the projective limit 
of the measurable, measure- valued functions lo Pi[. \Ci]{u}). In analogy to 
probability measures, we refer to the conditionals Pi[Xi|Ci] as the marginal 
conditional prohahilities of Pd[-^d|Cd], or marginals for short. 

Proof. The proof relies on the simple fact that measurability of map- 
pings is preserved under projective limits (as is continuity [7, 1.4.4]): 

Lemma 1. Let {0,,A) he a measurable space, (Xi,Bi, fj[) ^ a projective 
family of measurable spaces with projective limit (X^, Bo), and {wj : — )• A'j)^ 
a projective family of measurable mappings. Then the projective limit Wo ■= 
l^m Wi is a measurable mapping Q ^ Xo. 

Proof (Lemma 1). Since (wi)^ is projective, Wi o fi = fi o Wu- By 
measurability of Wi and fi, the composition fi o vuo is ^-i3i-measurable 
for all I £ D. Since the canonical mappings fi generate B^, is A-B^- 
measurable. □ 

The regular conditional probabilities Pi[Xi|Ci] can be regarded as a family 
of random measures, i.e. as measurable mappings Pj : — )■ M{Xi) defined 
by w I— 7- Pi[Xi|Ci](a;). To prove Theorem 1, we argue that this family is 
projective (in the sense of App. A, Lemma 9), with the desired conditional 
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probability Pd[^d|Ci] as its projective limit. However, we have to account 
for the fact projectivity of the mappings holds only almost everywhere. 

Denote by M{Xi) the set of probability measures on Afj. The continuous 
mappings /ji induce, by means of Pj i— )• /ji(Pj), a continuous projection 
/ji : M{Xj) — )• M(Afi). With respect to these projectors, the measurable 
mappings Pi : — >• M{Xi) are projective almost everywhere: For any pair 
I < J oi indices, (3.2) holds up to a null set A^ji C O of exceptions. Write 
N := Uj-^jNji for the aggregate null set, N'^ := fl \ N for its complement. 
The restricted mappings Pil^vc : -^'^ M{Xi) form a projective family of 
Cd n A^*-^-measurable mappings, and by Lemma 1 have a unique, measurable 
projective hmit P^"" : iV^ M(A'd). This mapping satisfies 

(3.4) (/iPr)(^) = /i(^'i^''(^)) = ^i(^) for all G . 

The first identity is due to the definition of projective limit mappings; the 
second follows by observing that, for any to £ N'~^, (Pi{uj))^ is a projective 
family of probability measures with projective limit measure Pj^ (^)* 

As a countable projective limit of Polish spaces, A'd is Polish, and so is 
M{Xu). Therefore, the CnHA^'^-measurable function P^f : N'^ M{Xo) has 
an extension to a measurable function : — )• M{Xy)) [29, Theorem 12.2]. 
This function Pd(w) =: Po[Xj^\Cu]iuj) is a regular conditional probability on 
A'd, and satisfies (3.3) P-almost everywhere. □ 

Like projective limits, puUbacks generalize from measures to conditional 
probabilities. 

Proposition 1 (PuUback of regular conditional probabilities). Let P[X\C] 
be a regular conditional probability on a standard Borel space X . Let X be 
a Hausdorff space, (p : X ^ X injective, and B := cl)^^B{X) the induced a- 
algebra on X . Denote byVt (Z^ the set of alluj satisfying P*[(j){X)\C]{oj) = 1. 
Then v{A,(jj) := P[(j){A)\C\{u) is a probability kernel on X, and can be 
regarded as a regular conditional probability of the random variable X := 
(f)^^ o X\^, given C r\(l. 

Clearly, Q may be empty. A pullback construction of a model will there- 
fore typically involve a result characterizing either Vt or a subset of 0. The 
characterization is usually expressed as the image of O under a suitable 
parameter random variable, i.e. as a result describing a set of "parameter 
values" for which the model concentrates on X. An example of such a char- 
acterization is the Kolmogorov continuity theorem mentioned in Example 2: 
The Gaussian process in the example can be parametrized by its mean and 
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covariance functions, and the theorem specifies a subset of parameter for 
which the puUback exists. Lemma 5, 6 and 8 in Sec. 7 are further examples 
of such results. 

Proof. Since B is the u-algebra induced by cj) and cp is injective, the 
inverse (/>~^ is automatically measurable with respect to 13{X)ri(j){X), so the 
restriction of the mapping (j)~^oX is indeed a valid A'- valued random variable 
on ri. The result follows by a simple point-wise application of pullbacks to 
the measures P[. |C](ti;) for to £ Q. □ 

The combination of Theorem 1 and Lemma 1 results in a two-stage ap- 
proach to the construction of regular conditional probabilities, analogous to 
the two-stage construction of stochastic processes in the sense of Definition 
1: First construct a suitable projective limit Pd[^d|Cd], and then pull back 
to a (possibly non- measurable) subspace C A'd , or to a space X embedded 
into A'd by a Borel embedding (p. 

Both steps can be combined into a single step under an additional as- 
sumption - namely that the embedding of Af, i.e. the image 4>{X), is actu- 
ally measurable in X^y. The extension result obtained for this case can be 
regarded as a conditional probability analogue of the well-known projective 
limit theorem of Prokhorov [9, IX. 4. 2], just as Theorem 1 is analogous to 
the extension theorems of Kolmogorov and Bochner. 

Corollary 1 (Prokhorov extension). Let Bi, fji) ^ be a countably 
indexed projective system of Polish measurable spaces, X a Hausdorff space, 
(j) : X ^ Xo continuous and injective, and require (j){X) G Bo- Let (Pf[Xj|C/])^ 
be a projective family of probability kernels on Bj x 0. Define O to be the 
subset C of all Lo for which the family of measures {Pi[ . |C/](a;)) satis- 
fies the following "Prokhorov condition": 
For all e > 0, there is a compact set K C X such that 

(3.5) Pi[(pjK\Cj]{uj) > 1 - e foralllGD. 

Then there is a unique (up to equivalence) probability kernel P[. |Cb](w) on 
B{X) X Q with the projective family as its marginals, i.e. (/)iP[.\Cd] =a.e. 
Pi[ . |C/] . This probability kernel 

1. is a Radon measure for each to £ fi; 

2. is the pullback of Pj:,[.\Cr>] = ^im . |C/])^ under (p, and hence a 
conditional probability given Co = Cd n il. 
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In the following sections, we will derive a number of results on how certain 
statistical properties of conditional models are preserved under projective 
limits and pullbacks. For conditional probabilities constructed by means of 
the Corollary, statement (2) makes all these results immediately applicable, 
since the constructed probability kernel P[ . |Cd] can effectively be decom- 
posed into the projective limit Pd[. |Cd] and a subsequent pullback. 

Proof. For almost all uj G il, the measures Pi[. |Ci](a;) form a projec- 
tive family and satisfy the Prokhorov condition. Since the spaces Xi are 
Polish, each of these measures is a Radon measure. By Prokhorov's the- 
orem [9, IX. 4. 2], there is a unique Radon probability measure i^^j on X 
satisfying (/>i(i^tj) = Pi[ - By the Kolmogorov-Bochner extension the- 

orem (App. A, Theorem 4), there is also a unique projective limit prob- 
ability kernel Pd[-|Cd] on = ^^(Xi)^. Since (p = ^hn(<Ai)Di we have 
Pu[- |Cd](w) = (fiivto) for almost all uj £ ^l. The image (p{X) is measurable, 
and so P*[(l){X)\Co]{co) = = v^{X). Therefore, the pullback 

under (f) exists, and by uniqueness has to coincide with almost every- 
where. □ 

The induced conditional probabilities P[.|Cd] on X are regular, since 
measurability in a; carries over from Xjy under the pullback. This is remark- 
able in so far as virtually no requirements are imposed upon the space X 
- in particular, the topology of X need not admit a countable subbase ~ 
and conditional probabilities on X need not be regular in general. In other 
words, much as the Radon regularity of measures on a space which supports 
non- Radon probability measures is induced by the marginals, so is regularity 
of the conditional. 

3.2. Criteria for Projectivity. To construct a conditional stochastic pro- 
cess PqI-'^dICd] by means of Theorem 1 will in practice require proof that 
a given family of conditional probabilities is projective. The following two 
results provide applicable criteria. 

Lemma 2 (Criterion 1). Let the random variables X, satisfy f.nXj = Xj, 
and let (Cj) he a directed family of a -algebras. Then the family {Pj[Xi\C^) 
is projective if and only if the random variables satisfy the conditional inde- 
pendence relations 

(3.6) Xi _U_c, Cj for alll <J . 

Proof. By the properties of conditional independence. 



(3.7) X,ALc,C, ^ P[^|Ci,Ca] P[A|Ci] for ah A G a(Xi) 
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See [28, Proposition 6.6]. Application to the definition of projectivity yields 
(3.8) PAfn'AlC,] ^''''=^'''nx,-'A,\C,] ^ = ^P,[Ai\C,] ■ 



□ 



We recall that projectivity of conditional probabilities P[[Xi|0 ij as m 
(3.2) implies projectivity of the corresponding unconditional measures Pi = 
Xi(P). Lemma 2 gives a necessary and sufficient condition for the converse 
to hold as well: If the cr-algebras Ci are generated by parameter variables 0i, 
(3.6) takes the form Xi JJ-ei ©j- For a fixed /, the criterion demands that - 
given full knowledge of Qi - information about the parameters correspond- 
ing to any other dimensions will not change our mind about Xj. If this is 
true for any /, the family is conditionally projective. The lemma implies a 
similar result by Lauritzen [40, IV, 3.1] on sufficient statistics: Since (3.6) is 
a necessary condition, any sufficient statistics {Si) satisfy Xi iLgj 5j if the 
family of models is known to be projective. 

In practice, a candidate family of finite-dimensional conditionals can be 
expected to be defined by densities, with respect to some family {vi)-^ of 
carrier measures. The next criterion addresses the special case where the 
projective system consists of product spaces Xi = Hie/ ^{i} ^ Example 
2, and hence /ji — pfji- The carrier measures are then typically product 
measures, and proving that the family is projective involves an application 
of Fubini's theorem. The following criterion makes this step generic. 

Lemma 3 (Criterion 2). Let {P,[Xj\Qj]) ^ he a family of conditional proh- 
abilities on a projective system {JJiQi X^i^ji^i^jB^q, pvjj) ^, where each is 
Polish. Require: (1) For all I £ D, the conditional density pj of Pj[Xj\Qi\ 
with respect to a carrier measure Vj on X, exists. (2) The carrier measures 
are product measures Vi = <Sii£ii'{i}. Then the family P[[Xj\Q[\ of condition- 
als is projective if and only if 

(3,9) / M..I«.)<i.„(..,)=P,(.,lfr„«,) — /iJ. 

The use of J \ / as an index is justified by the fact that D consists of all 
finite subsets of a given set, and is ordered by inclusion. Therefore, I ^ J 
implies J \ I G D. 

Proof. First suppose condition (3.9) is satisfied. Denote by Pi{xi\6i) the 
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conditional density of -Pi[Xi|0i]. By Fubini's theorem, 



(3A0) 



Pj{xj\6j)di'j{xj) = (/ pj{xi,Xj\i\9j)di^j\i{xj\i))di'j{xj) 



Pi{xj\pTj-,9j)diyj{xj 



for all A, e Bj. Hence, Pjlpi'^Ail^J = ^.i] = ^i[^i|0i = P^ji^j] for all 6, up 
to a null set, which establishes the "if" implication. Conversely, assume that 
the family is projective. Abbreviate a{xi,9j) := /^^^^ pj(xj|6'j)dz/j\i(xj\i). 

Then P,[pr7/Ai|9j = 6*,,] = /^^ a{xi,9j)dui{xi), and for all Ai G Bi, 



(3.11) / a(xi,6lj)dt'i(xi) = / pi(xi|6'i)dt'i(xi) = / p(xi|prjj6'j)(ifi(xi) 
J Ai J Ai J Ai 

The first identity is simply projectivity, the second one follows from the fact 
that a{ . , 9j) is ^Bi-measurable by Tonelli's theorem. Since a and p integrate 
identically over all Ai and are ^Si-measurable, a( . , 0j) = Pi{ ■ |prjj0j) holds 
i/j-a.s. □ 

4. Application to Bayesian Models. The results of the previous sec- 
tion provide the formal means of defining projective limits of Bayesian mod- 
els, since a Bayesian model is completely defined by a pair of conditional 
probabilities. Combination of such projective limits with puUbacks under 
Borel embeddings allows us to represent nonparametric Bayesian models 
by projective families of finite-dimensional Bayesian models. Since the term 
"parametric model" is often associated with finite-dimensional or dominated 
models, we will instead use the term "parametrized" to describe a statistical 
model indexed by a parameter, regardless of whether the dimension of the 
parameter is finite or infinite. 

4.1. Parametrized and Bayesian Models. We briefly recall the formal no- 
tion of model and parameter; a detailed discussion is given by Schervish [51, 
Ch. 1.5.5]. Let X : Q. ^ X he a. random variable with values in a Polish 
space X, such that = X°^(P) is exchangeable. Let M{X) be the set 
of probability measures on X, and denote by F : X°° — )■ M{X) the map- 
ping induced by the empirical measure. Let ^' be a parametric index, i.e. a 
bimeasurable mapping from the image {F o X°°){Q) C M{X) onto a mea- 
surable space (T, B-j-)- Then the derived random variable G := ^oFoX°° is 
called a parameter, and we call the regular conditional probability P[X|0] 
a parametrized model. In summary, 

F ^ 

(4.1) n x'^ -m{x)dv -r. 
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where the set V := {P[X\e = e]\e e T} is the model P[X\e] regarded 
as a family of measures. The assumption that X is Polish guarantees both 
the existence of regular conditional probabilities on X and the validity of de 
Finetti's theorem [28, Theorem 11.10]. The theorem in turn implies a law of 
large numbers, which guarantees convergence linin Fn{X^) — )• X(¥) of the 
empirical measure in the weak* topology on M{X), and hence ensures that 
F is well-defined. 

The parameter random variable induces an image measure = 0(P) 
on the parameter space {T, B-y). For any given abstract random event w G 
the corresponding value 6 = of the parameter is completely deter- 

mined by X°°[uj), as the image under ^ o F. The partial information about 
Q{uj) contained in a finite sample X"'{lo) = can be conditioned on as 
P^[G|X" = x"']. Under suitable conditions the actual value 9 = 0(a;) is 
asymptotically recovered as P^[Q\X^ = x"] "~^°^> Sq. In this context, P^{Q) 
is referred to as a prior distribution, P[X\Q] as a sampling model or likeli- 
hood, and P^[0|X] as the posterior under observation X. Additionally, the 
prior can be represented as a parametrized model P^[0|y = y], where Y is 
a hyperparameter. We refer to the whole system summarily as the Bayesian 
model defined by ^[XIB] and For our purposes, it is sufficient to 

assume that the prior is the "true" prior, i.e. the actual image measure under 
0. If the sampling model is dominated, Bayes' theorem is applicable, and the 
posterior can be represented by the density ^^^j.^^ with respect to the prior. 
For undominated models, notably the Dirichlet process, some alternative to 
Bayes' theorem is required. In Bayesian nonparametrics, this alternative is 
usually conjugacy (Sec. 6). 



4.2. Application of Projective Limits. Suppose that, for a projective sys- 
tem of sample spaces i^X^, fji)_^, & parametrized model is given on each space: 
Each object in (4.1), except for the abstract probability space Jl, is equipped 
with an index /. 



(4.2) 




The mappings /ji : Xj ^ Xi induce projections /ji : X^ — )• X^ and /ji : 
M{Xj) M{Xi). If f,nV.i = Vi, bimeasurability of imphes that fn has 
a unique, measurable pushforward g^i := o /ji o "^j^ on 7j. Hence, if 
the conditional probabilities Pi[. \ Qi] defining the parametrized models Vi 
are projective, we obtain |6j = 9j] = Pi[.\@i = QjiOj]. By applying 
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a projective limit to all spaces, mappings and conditionals indexed by / 
in (4.2), we obtain a projective limit system of the form (4.1) - with each 
quantity indexed by D, respectively. The resulting diagram again constitutes 
a parametrized model. We refer to this model as a projective limit model in 
the following. 

The definition immediately carries over to Bayesian models: A projective 
system of Bayesian models is defined by three projective systems of standard 
Borel spaces (X^, B{Xi), f,^)^, (Ti, B (Ti), Qji)^ and (3^i, ;S(>'i), /iji)^, and by 
projective families (Pf and (-P/[0i|^i])j3 of conditional distribu- 

tions. The uniqueness up to equivalence of the projective limit conditionals 
(Theorem 1) implies that the following diagram commutes: 



(4.3) 



9i 



^'[©il^i] 



9i 



In other words, we obtain the same posterior regardless of whether we (i) 
take projective limits of the finite-dimensional models and then compute the 
infinite-dimensional posterior, or (ii) compute all finite-dimensional posteri- 
ors under marginal observations and take the projective limit. 

4.3. Application of Fullbacks. Fullbacks can be applied to parametrized 
and Bayesian models in a manner largely analogous to projective limits. 
However, the pullback in general results in a restriction of the abstract 
probability space: If a probability measure F = X(¥) is pulled back under an 
injective map (j) : X ^ X, the resulting random variable X = (j)~^oX is only 
defined on Cl := X~^(p[X). As a subset of the abstract probability space, Q 
can always be assumed measurable, and we will for simplicity assume that it 
is not a null set. The corresponding restriction P of the abstract probability 
measure P is the conditional P( . ) = P[.|r2], i.e. the abstract probability 
space underlying the pullback measure P = X{F) is {(l,Ari Cl,F). 

Consider the parametrized model P[X|B] described by (4.1). In this case, 
the entire diagram (4.1) may be pulled back to obtain 



n 



X" 



x^ 



F 



(4.4) 



X" 



X^ 



M{X) D V 



- M{X) D V 



T 



Jt 



- T 



The pullback is applicable only for those values Q = 6 with P*[X\Q = 9] = 
1. Let T C T he the set of such values. Denote the corresponding set of 
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pullbacks V. The restriction $7 induced by the pullback is represented in the 
diagram by the canonical inclusion mapping Jq^. The mappings X, F and ^' 
are the restrictions of the mappings X, F and ^ to the respective restricted 
domains. Whenever 6 gT, we write P[ . |0 = 0] for the pullback measure of 
P[.\@ = 6]. This notation as a model parametrized by Q is justified by the 
following lemma. 

Lemma 4. Let . ) be the pullback of P[X\Q = 9]. Then the function 
(^4,0;) I— )• i'e(u)){^) is a regular conditional probability of X given the random 
variable Q := o F o X°° , i.e. a regular version of¥[X G A\a{Q)]{uj). 

Proof. The function i>e{.) is the restriction i>e(,)(yln^) = P[yl|9]|^ of 
the integrable, fj(0)-measurable function P[A|0]( . ). Since a{@) = a{@)riCl, 
the mapping w i— )• z>e(^)(A n X) is cr(©)-measurable for every A £ Bx- The 
pullback of an integrable function preserves the integral (cf. (A. 4)). Hence, 
i>e( ) is a conditional given a{Q): For any C E o"(G), 

/ VQ,^-.{Ar\X)d¥{u) = [ ¥[X~^ A\Q]{uj)dF{u:) 
Jcnn Jc 

= F{A n c) = F{{x-^A nn)n{cn n)) . 

□ 

For a Bayesian model, the pullback is consecutively applied to the sam- 
pling model and to the prior. The pullback of P[X|0] induces a restriction 
of the parameter space from T to T. Lemma 4 guarantees that the induced 
random variable G is indeed the parameter variable of the resulting model. 
The prior family P^[0|y] can hence be pulled back under JTt- The pullback 
exists for all y £ y with outer measure P^'*['T'\Y = y] = 1, which in turn, by 
another application of Lemma 4, induces a restriction of the hyperparameter 
space to y C y. 

4.4. Nonparametric Evaluation. The term nonparametric Bayesian model 
usually implies that a set of finite-dimensional measurements are explained 
by a posterior distribution on an infinite-dimensional parameter space T. In 
some models, the sampling distribution P[X|0] is chosen to generate finite- 
dimensional values given an instance 9 of the infinite-dimensional parameter 
variable 0; the Dirichlet process construction in Sec. 7.1 is an example of 
such a model, where e.g. = M and ^ is a probability measure on M. For 
other models, such as the Gaussian process, it may be more convenient 
to assume that finite-dimensional measurements are censored observations 



20 



P. ORBANZ 



of infinite-dimensional random quantities. Which of these assumptions is 
appropriate, and how a posterior is to be computed under censored obser- 
vations, depends on the model in question. 

The setting can in general be formalized as follows. Let Ji, . . . , /„ G D he 
index sets, and suppose measurements Xi. E Xj. are reported for j = 1, . . . ,n. 
The nonparametric Bayesian model explains these measurements as being 
generated by (i) drawing 6 from the prior distribution; (ii) generating n 
samples X^^\ . . . , X^^'^ from P[X|G) = 6]; and finally, (iii) censoring the 
samples as Xi- — (pi-X^-^K Whether the index sets Ij are fixed or gener- 
ated at random does not affect the formalism, provided that their choice is 
stochastically independent of the random variables in the model. Since the 
censored observations Xi- are represented as projections of separate instances 
X^^\ X^'^\ . . . , they are conditionally independent given 6. Asymptotically, 
we recover either 6, or a censored version of 9, depending on the index sets 
Ij at which sample information is obtained. 

5. Sufficient Statistics. The purpose of this section is to show that 
the application of sufficient statistics commutes with the application of 
projective limits and pullbacks. If each element of a projective family of 
parametrized models admits a sufficient statistic, the projective limit of 
these functions is a sufficient statistic for the projective limit model. Sim- 
ilarly, the pullback of the sufficient statistic is a sufficient statistic for the 
pullback model. 

Definition 3 (Sufficient statistic [23]). Let be a regular con- 

ditional probability. A cr-algebra 5 C ^ is called sufficient for P[X|©] if 
there is a probability kernel k : Ax Q ^ [0, 1], such that (i) oj i— )• k(B,uj) is 
5-measurable for all B £ Bx, and (ii) for all B G B{X), 

(5.1) P[B\e,S]{uj) = k{X~^B,uj) P-a.s. 

If S is sufficient and (U,B{U)) is a measurable Polish space, then a mea- 
surable mapping S : X ^ U is called a sufficient statistic for P[X|0] if 
S = a{SoX). 

Theorem 2 (Sufficient cj-algebras and projective limits). Consider a 
projective limit model Pd[Xo\Qd] = {Pi[X i\Q i]) For each I £ D, let 
Si C A be a sufficient a -algebra for Pi[X[\@i]. 

1. If (Si)^ is directed, So := cr{Si;I £ D) is sufficient for Po[Xa\@D]- 

2. Let (Ui,B{Ui),hjj)^ be a projective system of measurable spaces and 
Sj : Xj ^ Ui projective measurable mappings. If each Sj is sufficient 
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for Pi[Xi\@i], a sufficient statistic for PdIXoIQd] is given by So ■= 

3. Conversely, if any C <Z A is sufficient for a projective limit model 
Pr)[Xr,\QD\, it is sufficient for all marginals Pj[Xi\Qf\. 

Proof. (1) We have to show that 5d satisfies (5.1), which is equivalent 
to Xi iL^j 01 [28, Proposition 6.6]. We will draw on two properties of condi- 
tional independence: Consider two u-algebras J- and Q. Firstly, if D is any 
countable set, and {CijigD a family of a-algebras, then 

(5.2) T ALg Ci for all / ^ T ALg a{Ci] I e D) . 

Since G is fixed, (5.2) is a direct consequence of the analogous result for 
unconditional independence [28, Corollary 2.7]. Secondly, suppose the index 
set D is a directed set and {Ci}i£o a directed family. Then for any fixed 
Iq £ D, the following holds: 

(5.3) ( J" ALg for allJ ^ Iq ) ^ ( -U-g C, for allJ G D ) 

Under the assumptions of the theorem, Si satisfies (5.1) for each /, or 
equivalently, Xi iL^j Bj. Since the family of conditionals is projective, we 
additionally know Xi ilej 0j (Lemma 2). Therefore, by the chain rule, 
Xi iL^j 0j whenever I ^ J, and hence for all J G D according to (5.3). 
Since cr{Qj^) = cr(0i;/ G D), we can apply (5.2) to obtain Xj Jl^j ©d. The 
fact that Si C Su implies Xi ALs^ ©d. The c-algebra generated by the pro- 
jective limit variable X^ is simply a{Xi;I G D), so another application of 
(5.2) yields Xo AL^^ ©d, which makes 5d sufficient for the projective limit 
model. 

(2) Since the functions Si are projective, the family (cr(5i))^ of cr-algebras is 
directed, and cr{Su) = cr{Si;I G D). The claim follows from part (1) above. 

(3) Let ku be the kernel for which C and Po[Xd|0d] satisfy (5.1). We need to 
derive a suitable kernel ki for each I £ D. By Lemma 2, the marginals satisfy 
Xi _LL0j ©J, and hence Xi _LL(0j C) ©j- Since trivially also Xi _LL(@j C) C, the 
chain rule yields Xj AL(^q^^q (©jjC). Again by Lemma 2, the latter implies 

(5.4) (/iPo)[Xo|©D,C] Pi[Xi\ei,C] . 

Since C is sufficient for Pd[Xd|©o], the left-hand side of (5.4) is equal 
to a kernel fik^y. Hence, ki := fiko is a C-measurable kernel satisfying 
ki{Ai,Lj) =a.e. -Pi[A|0i5 C](a;), which makes C sufficient for Pi[Xi|©i]. □ 

To make the construction of the sufficient statistics of a parametrized 
stochastic process fully compatible with the construction of the process itself 
requires an analogous result for pullbacks. 
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Proposition 2 (Sufficiency and pullbacks). Let P[X\@] be the pullback 
of a parametrized model P[X\Q] under a Borel embedding Jx ■ X ^ X . If 
S : X ^ U is a sufficient statistic for P[X\Q], then S := So Jx is sufficient 
forP[X\Q]. 

Proof. We liave to sliow tfiat P[X|6] and S satisfy (5.1) for a suitable 
kernel k, which we define as follows. Let k{A, oj) be the kernel (5.1) for S and 
P[X\e]. The domain of 5oX is {S o xy'^U = Q. For any A G a{X), define 
k as k{Ar] n, .) := k{A, . )\^. By definition, oj i— t- k{An Q,a;) is measurable 
with respect to a{S o X) DO, = a{S o X), which implies measurability with 
respect to the finer u-algebra a{@,X o S) = a{@,X o S) CiCl. Hence, (5.1) 
is satisfied if the integral of k matches that of the conditional probability 
for each set in a{Q,X o S). Let C = C CiQ he any such set. As pullbacks 
preserve integrals in the sense of (A. 4), 

/ k{A,SoX{u}))dF{uj)= f k{A,SoX{uj))d¥{uj) = ¥{X-^AnC) 
Jc Jc 

= ¥{x-^Ancnn) = f p[A\e,s]{uj)dF{u) . 

Jc 

Thus S, k and P[X\e] satisfy (5.1), and S is sufficient for P[X\e]. □ 

We conclude this section with a result on minimality, i.e. the question 
whether a "smallest" sufficient <T-algebra exists for a given model. The con- 
cept is closely related to that of a minimal sufficient statistic - a sufficient 
statistic to which any statistic sufficient for the model can be reduced by 
transformation - but the two are not equivalent [37]. 

Definition 4 (Minimal sufficient cr-algebra [37]). A cr-algebra 5o C ^ 
is called minimal sufficient for P[X|G] if it is sufficient, and if every other 
sufficient cr-algebra C satisfies: 

(5.5) yAeSo3C €C : P[AAC\e = 6] = ior all 6 € T . 

Intuitively, minimality captures the idea that any cj-algebra C can only be 
sufficient for the model if it contains all information contained in Sq (though 
this interpretation is inaccurate in the undominated pointed out 

by Burkholder [10]). However, instead of demanding Sq C C, and hence 
that every set in So is also in C, we only require that each set in So be 
indistinguishable from a set in C under the resolution of the model. 

A minimal sufficient cr-algebra always exists if the model P[X]0] in ques- 
tion is dominated. In undominated models, a sufficient cr-algebra can - rather 
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contrary to intuition - be contained in a finer cr-algebra which is not suffi- 
cient, and a minimal sufficient cr-algebra need not exist [10]. However, as the 
following theorem shows, existence is guaranteed if the model is constructed 
as a projective limit from dominated marginals. This implies, for example, 
that the Dirichlet process on the line admits a minimal sufficient a-algebra, 
even though it is undominated. 

Proposition 3 (Minimal sufficiency). Suppose that each a-algebra Si 
as specified in Theorem 2 is minimal sufficient for Pf[X/|0/]. Then So = 
^m(5j)^ is minimal sufficient for Po[Xo\Qd] = ^m (Pi\Xj\Qi\) ^. 

Proof. 5d is sufficient by Theorem 2; we have to verify (5.5). Let C C A 
be any sufficient cr-algebra for Po[^d|0d]- By Theorem 2, C is sufficient 
for all Pi[Xi|0i], which implies that (5.5) is satisfied if A £ UiSi. For the 
general case ^4 G 5d, observe that the set system UnSi is both an algebra 
and a generator of So ■ By the basic theorem on approximation of a measure 
on a subalgebra [2, Theorem 5.7]), any set A £ So can hence be approxi- 
mated by a sequence of sets A^ £ Ui5i such that lim„P[^nA^|0 = 6] = 0. 
Since each satisfies (5.5), there is a corresponding set C„, G C such 
that F[AnACn\& = 6^] = 0. Then lim„P[AAC„|e = 6] = 0, and therefore 
P[v4A U„ C„|© = 0] = 0. Since C is a u-algebra, U^Cn G C, and A satisfies 
(5.5) for C := U„C„. □ 

6. Conjugacy. The posterior of a Bayesian model is a regular condi- 
tional probability, and always exists if the model is defined on Polish spaces. 
However, since the abstract components of the model - the probability space 
(Q, A, P) and the random variables X and - are not given explicitly, there 
is in general no way to deduce the posterior from the sampling distribu- 
tion and the prior. The problem is solved by Bayes' theorem whenever the 
sampling distribution is dominated, i.e. if P[X|6] has a conditional density 
[51, Theorem 1.31]. This need not be the case in the infinite-dimensional 
setting of Bayesian nonparametrics. For a certain class of Bayesian models, 
so-called conjugate models, the posterior can be specified without appealing 
to Bayes' theorem. Virtually all nonparametric Bayesian models studied in 
the literature are of this type (see e.g. [55]). 

Definition 5 (Conjugate Bayesian model). Let P[X|6] and P^[6|y] 
specify a Bayesian model. Let (T^"))„ be a family of measurable mappings 
T(") : Af" X 3^ ^ W with values in a Polish space. The family is called a 
posterior index of the model if there exists a probability kernel k : Bj- x W — )• 



24 



P. ORBANZ 



[0, 1] such that 

(6.1) P^[^|X" = (xi,...,x„),y = y] fc(^,r(")(xi,...,xn,y)) 

for every A £ B-j-- A Bayesian model is called conjugate if there is a posterior 
index for which W C y and the associated kernel k satisfies 

(6.2) k{A,y') = P^[A\Y = y'] for all y' G 3^ . 

Apparently, any Bayesian model admits the identity T^"-* := ldx"xy as 
a trivial posterior index. By (6.2), a conjugate posterior is "in the same 
family" as the prior, a model property commonly referred to as closure 
under sampling [47]. 

In a projective system, we have to consider a family of spaces Wi as the 
respective ranges of the posterior indices (r/"^)„. As for the hyperparameter 
spaces 3^1, we will denote the projectors on these spaces by hji, since Wi and 
3^1 are either subsets of one another, or can without loss of generality be 
assumed to be contained in a common superspace. The following theorem 
states that the posterior updates of a nonparametric Bayesian model have 
the same "functional form" as those of its finite-dimensional marginals. It 
also implies that conjugacy of the model requires conjugate marginals. 

Theorem 3 (Conjugacy in projective limit models). Let {Pj[Xi\@j])^ 
and (-Pf [0/|^/])^ define a projective family of Bayesian models. 

1. Let (T/"^)n be posterior indices and projective, i.e. 

(6.3) r/"^ o {hjj /;,) = hji o rj"^ forIeD,ne N. 

Then the mappings T^"^ := ^im (Tj^^)^ form a posterior index of 
the projective limit model. If each marginal model is conjugate under 
(Tj^^)n, the projective limit model is conjugate under {T^^)n. 

2. Conversely, let the projective limit he conjugate. If the canonical map- 
pings fi, hj are surjective, the marginals of the model are closed un- 
der sampling. If additionally each of the mappings fi and hi is open 
or closed, the marginals are conjugate and their posterior indices are 
pushforwards of {T^^)n satisfying 

(6.4) r/"^ o (/f hj) = h,o T^") /or / G D, n G N . 



(Proof: App. B.) 
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Consequently, a conjugate nonparametric Bayesian model can only be ob- 
tained from marginals which are closed under sampling. Dropping either of 
the two assumptions in the theorem - that the model is defined as a projec- 
tive limit and that the canonical mappings be surjective - does not lift this 
restriction. If the model is not explicitly assumed to be a projective limit, 
a family of marginals can always be obtained by defining Qi := giQo and 
:= E[(/iPd)[Xd|0d]|cj(0i)], etc. The components so obtained form 
projective families with the initial model as their limit, and the theorem is 
applicable. Similarly, if the canonical mappings are not assumed surjective, 
we simply obtain a more technical statement of the theorem which requires 
closure under sampling on the images of the canonical mappings. The gen- 
eralization obtained in this way is trivial, since all measures used in the 
construction have to concentrate on these images. We also note, in the con- 
text of part (2), that the projectors prj in a countable product of Polish 
spaces are always open mappings. 

Similar to projective limits, pullbacks preserve conjugacy: 

Proposition 4 (Pullbacks of conjugate models). Let the Bayesian model 
specified by P[X\Q] and P^[Q\Y] be conjugate, with posterior index (T("))„. 
Lei-P[X|0] andP^[Q\Y] be the respective pullbacks under J'x andj^. Then 
the Bayesian model specified by P[X\Q\ and P^[Q\Y] is conjugate, with pos- 
terior index given by the pullbacks of (T("))„ as 

(6.5) r(") := Jy^ o r(") o {J^ Jy) . 

Proof. As an arbitrary subset of a Polish space, T is separable, but 
not in general Polish, and conditional probabilities are not guaranteed to be 
regular. Since the pullback is defined by restriction, which preserves measur- 
ability, P^{Q\Y) nonetheless constitutes a well-defined regular conditional 
probability. Lemma 4 ensures that the spaces X , T and y all correspond 
to the same subset of the abstract probability space. Equation (6.5) is an 
immediate consequence of the definitions of posterior indices and pullbacks 
of parametric models. □ 

As an example of the previous results, we consider one of the most widely 
used Bayesian nonparametric models, a Gaussian process model for regres- 
sion under uniform measurement noise [50, 56]. The purpose of the example 
is to provide concrete illustration of the abstract quantities above, and we 
sacrifice rigor for brevity and refer to Sec. 7 for more detailed constructions. 

Example 3 (Gaussian process regression under white noise). Consider 
a regression problem on [0, 1], in which measurements G M are recorded 
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at covariate locations s £ [0,1]. Assume each measurement to be a value 
9s corrupted by standard white noise £s ~ AA(0, 1), that is, Xg = Og + 
Eg. Since the noise is discontinuous, the function space X := £2(0, 1] is a 
more adequate setting than the set of continuous functions considered in 
Example 2. Let (ej)^^^ be an orthonormal basis of C2[0, 1]. Any x £ C2[0, 1] 
is uniquely representable as x = J2i X{i}ei, where x^iy = (x, Ci). The mapping 
(p : X ^ (x{i})jgN is an isomorphism of the separable Hilbert spaces C2[0, 1] 
and £2- Since £2 C M^, we choose the product projective limit = 
with canonical mappings fi := prj. In the terminology of Definition 1, the 
set r = £2 is the image of £2[0, 1] under the Borel embedding cj). 

To obtain a valid model on £,2 [0, 1] as a pullback of the Gaussian projective 
limits on Afn and To , we need to know that the models assign outer measure 1 
to the subset F = ^2 (cf- Sec. 4.3). Gaussian processes with realizations in £2 

- or, in our terminology, Gaussian projective limits which satisfy P^{£2) = 1 

- are characterized by a well-known result [36, Theorem 3.2]: Denote by 
5(^2) the set of all positive definite Hermitian operators on £2 of "trace 
class", i.e. with finite trace tr(S) < 00. The Gaussian projective limit Pq 
satisfies P^{£2) = 1 if and only if there are m £ £2 and S G 5(^2) such that 



To define a Bayesian model, we choose T = X, To = and gi = fi. 
To define a projective family of priors, let m £ £2, T, G Si^, and define 
the measures Pf{&i\Yi) as the Gaussian measures on 7i = with means 
gi{m) and covariance matrices (qi (8) The hyperparameter spaces are 

therefore := W x Sym(/, M), where Sym(/,M) is the symmetric cone of 
real-valued |/| x |/| s.p.d. matrices. The projector J — t- I on the latter deletes 
all rows and columns indexed by elements of J \ I. For the white- noise 
observation model, let I G 'S{£2) be the identity operator. Each marginal 
is chosen as the Gaussian conditional Pi[Xi|Gi;I], i.e. conditional on the 
random mean Gi for fixed unit covariance. The priors form a projective 
family of measures and the observation models, by Lemma 3, a projective 
family of conditional distributions. A conjugate posterior index of the model 
is given by 



(6.6) 



E[Xi] = /i(m) 



and 



Cov[Xi] = (/:®/:)(S) . 



r/"^ (xr,mi,Si) ^ (mi-Si(Si + -Ii)-i(mi 



n 



1 



5]xi),Si(Si + -Ii)~i) . 



Since the covariance of the observation model is the fixed identity matrix Ij, 
it is not a hyperparameter, and hence formally part of the definition of the 
mapping rather than an argument. 
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The posterior index T*^") of the nonparametric model can be constructed 
as follows. We define a candidate function which mimics the functional form 

(n) 

of the finite-dimensional posterior indices : For x G rh € ^2 and 
S e 5(^2), define a mapping £2 x £2 x 5(^2) 1-^ ^2 x 'Sih) as 

1 " 

It is straightforward to verify that (i) the maps t/"^ are the finite-dimensional 
projections of T^") under the projectors fio (p and hi o (p^ and that (ii) the 
family of maps (t/"^)^ is projective (i.e. satisfies (A. 2)). Therefore, T^") 
indeed coincides with the unique projective limit map on the relevant sub- 
space ^2 ^2 X of the projective limit space. By Theorem 3, the projective 
limit model is conjugate. Since T^"'^ maps £2 x £2 x Sg^ into £2 x Sg^, the 
posterior again assigns full outer measure to £2 = (/'(£2[0, 1]). By Theorem 
4, the pullback of the model under </> is a conjugate Gaussian process model 
on £2- The sufficient statistics Si = Idjgi of the marginals are trivially pro- 
jective, and by Theorem 2 and Proposition 2, the pullback S = Idg^ of their 
projective limit is a sufficient statistic of the Gaussian process model. 

Since conjugacy in parametric models is, with few exceptions, a property 
of exponential family models, we can interpret most conjugate nonpara- 
metric Bayesian models as infinite-dimensional analogues of the exponential 
family. Conjugate priors of exponential family models are characterized by 
a linear arithmetic in parameter space, as shown by Diaconis and Ylvisaker 
[14]. In particular, suppose that the marginals Pi[Xi|6i] and P[^[0i|Y[] are 
exponential family marginals with canonical conjugate priors. With respect 
to suitable carrier measures on the spaces Xi and 71, the marginals are then 
defined by conditional densities 

(6.7) = fMe(^:(^:)A) A9i\X,yi)='-^^j--^. 

Zi{Oi) Ai(A,7i) 

The function 5i : Afi — Ui is a sufficient statistic. Its range is a Polish topo- 
logical vector space Ui, which contains the parameter space 71 as a subspace, 
and is equipped with an inner product ( . , . ) . Hi denotes a non-negative 
function, and Zi, Ki are normalization functions. The prior is parametrized 
by A G M+, which determines concentration, and 71 G conv(S'i(A'i)), the 
convex hull of the image of Si. 

In our previous notation, the Bayesian model defined by (6.7) has hy- 
perparameter space := M+ x Ui, with yi = (A, 71). The posterior under 
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observations a;" = (xf , . • • ,Xj ) is given by the density pi{9i\Tj {x^ , yi)) , 
where the posterior index updates hyperparameters as A i— )• A + n and 
7i I— ^ 7i + Z^A: Apphcation of Theorems 3 and 4 results in an analo- 

gous representation for nonparametric models: 

Corollary 2 (Exponential family marginals). Let (Pi[X,\Qi])^ be a 
projective family of exponential family models with sufficient statistics Sj, 
and let (-Pf [0/|^/])^ be the family of corresponding canonical conjugate pri- 
ors. If the priors and the sufficient statistics both form projective families, 
the projective limit Bayesian model is conjugate with posterior index 

(6.8) t(")(xS,2/z,) := (A + n,7z, + E^-(^^'^)) ' 

k 

where yo = (A,7b) andSn '■= ^^(5'/)^ is the sufficient statistic of Po[X o\@ d] ■ 
Analogously, if the model is pulled back under (/> : — )■ Xo as in (4.4), 

(6.9) f(")(i",y) := (A + n, 7 + J] ^(^W)) , 

k 

is a conjugate posterior index of the pullback model. 

An example of such a posterior is the Dirichlet process on the line with 
concentration a and base measure Go, for which posterior parameters are 
updated under observations v^^\ . . . , v^""^ as 

n 

(6.10) iv^'\...,v^^\a-Go) ^ ^5:<5,(.) + ^Go. 

n + a n + a 

k=l 

The next section covers this example in detail. The Gaussian process regres- 
sion above is an instance of Corollary 2 as well, although our formulation in 
Example 3 uses the standard parametrization of the Gaussian, rather than 
an exponential family parametrization adapted to (6.9). 

7. Examples. Two detailed construction examples are given in this 
section to illustrate our results: The well-known Dirichlet process [16, 32], 
and a new nonparametric Bayesian model on the infinite symmetric group. 
The steps of both constructions are (i) the definition of projective systems 
to obtain and To, (ii) the definition of finite-dimensional priors and like- 
lihoods for each I £ D to define a projective limit Bayesian model, and (iii) 
a pullback step to ensure that the models concentrate on the desired sub- 
space of interest - the set of probability measures and the infinite symmetric 
group, respectively. By means of the results in Sees. 5 and 6, sufficiency and 
conjugacy properties of the models can then be read off from the properties 
of the marginals. 
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7.1. Dirichlet Process Priors. In this example, P is a Dirichlet pro- 
cess and P its conjugate observation model. The domain of the Dirich- 
let process is assumed to be a Polish measurable space {V,13v), i-e. ran- 
dom measures drawn from the process are convex combinations of the form 

= EfceN Cfe5i;fc with Vk e V. 

7.1.1. Projective System. The finite-dimensional marginals will be Dirich- 
let and multinomial distributions. Ferguson [16] noted that a particularly 
intuitive way to index such distributions is to choose each I £ D as a finite, 
measurable partition / = {Ai, . . . , ^|/|) of V. The |/|-dimensional Dirichlet 
distribution Pf can then be interpreted as a random measure on the finite 
cr-algebra a{I) generated by the sets in /. Let T-l{Bv) be the set of all finite 
partitions / = {Ai, . . . , ^|/|) with Ai £ By. This set is itself not an adequate 
choice for D, since it is uncountable unless V is finite. However, since V is 
Polish, there exists a countable algebra Q C By which generates By- Any 
probability measure on By can, by Caratheodory's extension theorem, be 
unambiguously represented by its restriction to Q. Bearing this in mind, we 
define D := T-L{Q) as the set of finite partitions with G Q. A partial order 
on D is defined by / ^ J if and only if /n J = J, that is, if J is a refinement 
of the partition /. 

For each index / = (j4i, . . . ,^mi) in -D, the marginal spaces are chosen 
as the spaces corresponding to a Dirichlet-multinomial Bayesian model over 
mi categories: Let the parameter space % be the set of probability distribu- 
tions on the (T-algebra generated by /, i.e. the unit simplex Aj C The 
hyperparameter space of a Dirichlet model on Ai is := M>o x Ai. 

To define the observation spaces Afj, we interpret the sets Ai £ I as 
categories or "bins" of a multinomial distribution. A sample in category Ai 
can be encoded as {Xi = Ai}. Hence, Xi takes values in := /. Both the 
topology Topj and Borel sets Bi on are generated by the singleton events 

To define suitable projectors, consider a pair / ^ J of indices, where 
/ = (^1, . . . , Ami) ^^"^ J — (^'i' • • • ) ^mj)- Any set Ai £ I \s the union of 
some sets in J, hence Ai = Uj^j^A'^ for some J, C [mj]. Let 9^ G Aj be a 
finite probability distribution and A'^ £ J. We define 

(7.1) IniA'j) = Ai for J G J„ and {g.n9,)i := E (^j)j • 

In words, for any coarsening of a finite set of events J to /, /ji maps A'j to 
the coarser event containing it, and gji sums the corresponding probabili- 
ties. Since the model is of conjugate exponential family type, the projections 
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hji : — 5- J'l of hyperparameters are given by /iji := Id^^ (g) gjj. The fam- 
ilies of mappings /ji : J — )■ / and ^ji : A,, — Aj are continuous, surjective 
and satisfy (2.1). It is straightforward to verify that (Xi,I3{Xi), fji)^ and 
{Ti,I3{Ti),g,n)^ are projective systems of measurable Polish spaces. Proper- 
ties of the hyperparameter spaces carry over immediately from 71. 

What are the projective limit spaces and To defined by the two sys- 
tems? The set consists of all collections of the form 

(7.2) xu = {Ci G I\I gD,CiD Cj whenever I ^ J} . 

Whereas a draw from Xi selects a single random set Ci G /, a draw from 
Xd selects one random set Ci for each /. A single, "smallest" set can be 
associated with each Xu G by defining limxo := H/Ci. Unlike the con- 
stituent sets Ci, the set limxn is not in general an element of Q, and we have 
Q C {limxD|2;D S ^d} C By. In particular, the proof of Lemma 6 below 
shows that the set {limxclxn G Xj^} contains all singleton {v} for points 
V ^ V , which are not contained in the countable set Q. In analogy to the 
interpretation of as an event Ai G I, we can interpret Xy> = as the 
event lim x^ in By ■ 

The projective limit Tu of parameter spaces is the set of all charges, i.e. 
of finitely additive probabilities on the algebra Q. The space {To,B{Tu)) 
contains the set M(Q) of countably additive probability measures as a mea- 
surable subset [45, Proposition 9]. For any set function G € To, the canonical 
maps : TL — ^ Aj are the evaluations G i— )• {G{Ai), . . . , G{Ami)- The fact 
that the space M{Q) cannot directly be defined as a projective limit of 
finite-dimensional simplices is an example for the projective limit's ability 
to encode a finitary property (finite additivity), but not an infinitary one 
(countable additivity). 

7.1.2. Bayesian Model. Each marginal Bayesian model is defined by a 
multinomial distribution Pi[Xi|0i] on mj categories and by its conjugate 
prior P[^[0i|Y[] on Aj. The Dirichlet distribution is a natural conjugate prior 
as in (6.7), with parameters (A, 71) := {X,aGi), where a G M4. controls con- 
centration and an Gi S Ai is the expected value. Since logZi{6i) = for 
the Dirichlet distribution, the value of A does not affect the model and is 
henceforth omitted. Though a controls the concentration of the model, it 
acts linearly on 9i, in contrast to the nonlinear influence of A on other con- 
jugate priors. It is easy to show that the multinomial and Dirichlet families 
so defined form a projective family of Bayesian models if and only if the 
hyperparameters are chosen consistently as 'ji := a ■ QiGq for a fixed a E M+ 
and some Gq G 7L- 
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7.1.3. Fullback to M(y). What remains to be done is to ensure that 
the Dirichlet process prior Pq[©d|Yd = y-o] defines a measure on the set of 
probabihty measures. 

Lemma 5 (Proof: App. C). IfV is Polish, the countable generating al- 
gebra Q C By can be chosen such that, for any charge Go on Q, 

(7.3) P^'*[M(Q)|y^ = (a,Go)] = l ^ Go G M(Q) . 

In other words, the prior concentrates on countably additive set func- 
tions if and only if its hyperparameter is countably additive. We obtain a 
corresponding concentration result for the sampling model: 

Lemma 6 (Proof: App. C). Define a relation (p C V x X^, by means of 

(7.4) V =(f, Xo ^ limxo = {v} 

1. (j) is a mapping V — )• X^,, and a Borel embedding. 

2. IfOo e M{V) is purely atomic, P*[</>(y)|efl = 6*0] = 1. 

Lemma 6 provides a suitable embedding for the pullback of Pd[- |0d]- 
For the prior, let Jj- : MiV) — >• 7d be the mapping which takes a probability 
measure v on By to its restriction on Q. By the Caratheodory extension 
theorem, jTr is injective. Since both M{V) and the Borel subset M(Q) C 
are standard Borel spaces, J-j is a Borel embedding (cf Sec. 2.2). We set 
X := G y}, which we identify with V, and choose T = MiV) as 

parameter space and y := M-|- x M{V) as hyperparameter space. We do not 
show here that draws from the Dirichlet process are almost surely discrete, 
and instead refer to [20]. 

Sufficient statistics of the marginal models can be defined as S": : / — t- Ai 
with Ai I— 7- the event {Xi = Ai} is mapped to a point mass at the 

singleton {A^} £ Bi. Define S : V ^ M{V) hy v ^{v}- For any v £ V and 
any I G D, there is a unique Ai G I with v £ A^. Hence, 

(7.5) {g, oJr)oS = S,o {f, o ct>) , 

since gi o Jj- maps u € M(y) to its evaluation on the partition /, and fio (p 
maps V £ V to the event Ai in / containing v. Therefore, Si is a pullback of 
the projective limit = ^^('S'l)^- By Theorem 2 and Proposition 2, S is 
a sufficient statistic for the pullback model. By Theorem 3 and Proposition 
4, the pullback model is conjugate. In summary: 
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Corollary 3. The puUback Bayesian model defined by P[X\Q] and 
P^[Q\Y] is a conjugate Bayesian model with hyperparameter space M>o x 
M{V), parameter space {v G M{y)\u discrete}, and sample space V . The 
posterior index under n observations x'^^\ . . . , x^"^ &V is 

(7.6) aGo ^ aGo + ^5(i(*)) , 

i 

and S : V is a sufficient statistics of the model. 

The measure P^[B|y = (a, Go)] is, in the terminology of nonparamet- 
ric Bayesian statistics, a Dirichlet process with concentration a and base 
measure Gq. 

7.2. A Nonparametric Model on Permutations. In this second example, 
the observations x are elements of the infinite symmetric group Soo , and the 
parameters are sequences 9 G satisfying a certain convergence condition. 
The infinite symmetric group is the set of all permutations of the set N which 
change an arbitrary but finite number of elements. 

Models on such infinite permutations are of potential interest in two con- 
texts: (1) Rank data is modeled by permutations, and a nonparametric ap- 
proach to ranking problems motivates models on infinite permutations. In 
parametric rank data analysis, models for "partial" data, i.e. data in which 
part of each ranking is censored, are used to model "rank your favorite r 
items out of a total of n items" [18]. In this case, n is the order of the 
underlying symmetric group S„, and r the number of uncensored positions. 
Meila and Bao [44] observe that positing a given set of n items to choose 
from makes most partial ranking tasks artificial. They suggest a nonpara- 
metric model on Soo to represent more realistic tasks ( "rank your favorite r 
movies" , as opposed to "rank your favorite r out of these n movies" ) . 
(2) The cycles of an infinite permutation induce a partition of N, and random 
permutations hence induce random partitions. The most prominent example 
of such a model is without doubt the Chinese Restaurant Process, proposed 
by Pitman and Dubins as a distribution on infinite random permutations 
with uniform marginals [46]. 

To construct a Bayesian model on Soo by means of a projective limit, 
we draw on a beautiful construction recently proposed in representation 
theory by Kerov, Olshanski, and Vershik [30, 31]. This approach constructs 
a compactification (3 of Soo as a projective limit of finite symmetric groups; 
Kerov et al. [31] refer to the elements of & as "virtual permutations". We 
construct a Bayesian model by endowing each of the finite groups with a 



CONJUGATE PROJECTIVE LIMITS 



33 



parametric model based on the Cay ley distance, due to Fligner and Verducci 
[18]. We then give conditions under which the projective limit concentrates 
on the subset Sqo- 

7.2.1. Projective Limits of Symmetric Groups. The projective limit of 
Kerov et al. [31] assembles the symmetric groups §1,82, •• • sequentially, and 
we hence choose the index set D = {[n]\n € N}, ordered by inclusion. To 
define a projective system, we need a suitable notion of projection mappings. 
Given the choice of D, it is sufficient to consider mappings /ji for J = [n + 1] 
and / = [n], which we more conveniently denote /n+i,n- Intuitively, the 
projection should remove the entry n + 1 from permutations in Sn+i ~ which 
raises the question of how to consistently delete n + 1 from, say, (3142) 
to obtain a valid element of §3. An appropriate projector can be defined as 
follows. Any permutation vr € admits a unique representation of the form 

(7.7) 7r = f7fci(l)cjfe2(2)---cjfc„(n) , 

where ki are natural numbers with ki < i, and crj(j) denotes the transposi- 
tion of i and j. Hence, the vector (ki, . . . , kn) is an encoding of vr. Let ipn 
be the corresponding mapping defined hy ipn : vr 1— )• {ki, . . . , kn). Due to the 
constraint ki < i, which makes the encoding of tt unique, the mapping is 
a bijection S„ — )• nm<n["^]' a homeomorphism of Polish spaces if both 
S„ and the image space are endowed with the discrete topology. On the 
encoding V'n'^! we can easily define a natural projection by deleting the last 
element, i.e. as {ki, . . . , kn, kn+i) 1— )■ (fci, . . . , /c„), which is just the product 
space projector prj^^^j^j The projectors fn+i,n are then chosen as the in- 
duced mappings on the groups, hence fn+i,n ■= ipn^ ° P'^in+iiM ° V'n+i- The 
following diagram commutes: 

5, V'n + l T-r r 1 

Sn+1 [[ [ml 



(7.8) /n+l,n 



m<n+l 

Pr[n+ll[n] 



§n — n 



m<n 



The projectors fn+i,n have a natural group-theoretic interpretation: They 
remove the element n + 1 from the cycle containing it. Intuitively speaking, 
application of C7fc^(l), . . . ,ak,^{n) from the left consecutively constructs the 
cycles of TT G Sn+i, pending insertion of the final element n + 1 into its 
respective cycle. This last step is ommitted by deleting (7fc^_^^(n + 1). The 
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definition of /n+i,n is consistent witli the Cliinese Restaurant Process [46]: 
Tlie image measure of the CRP marginal distribution on Sn+i under /n+i,n 
is the CRP marginal on S„. 

The projections fn+i,n determine {Xo,Top^) := ^im (Sn,Top„, /n+i.n)p- 
In the projective limit topology Top^, is totally disconnected and com- 
pact. In analogy to the finite groups, is homeomorphic to the prod- 
uct space HmeNi"^] under the encoding map Vd : (f'"fci(l)ffc2(2) ' ' ' ) ^ 
{ki, k2, ■ ■ ■)■ The infinite symmetric group Soo is a dense countable subset 
of A'o. Unlike Sqo, the space A'd is not a group, whereas conversely, Soo is 
not compact. The projective limit can thus be regarded as an abstract 
compactification of Sqo • Elements vr G A'd are representable in the form 

(7.9) 7T = ak,{l)ak,{2) ■ ■ ■ 

and can be interpreted as operations that iteratively permute pairs of el- 
ements ad infinitum. If and only if vr G Sqoj this process "breaks off" 
after a finite number n of steps, and the encoding of vr is of the form 
■0(vr) = {ki, ...,kn,n+l,n + 2,...). 

7.2.2. Distance-Based Models. A widely used class of probability distri- 
butions on finite symmetric groups are location-scale models of the form 

(7.10) p(vr|e,7ro) = ^e-^'^(-'-«), 

where d is a metric on S„. Such models are commonly referred to as distance- 
based models in the rank data literature. Fligner and Verducci [18] consid- 
ered the intersection of this class with another type of model: Let . . . , W^''^ 
be a set of statistics on S„ such that the random variables W^^^ (vr) , . . . , W^^^ (tt) 
are independent if vr is distributed uniformly. Define a parametric model on 
as 

k 

(7.11) p{7r\e) := ^exp(-^e(^-)l^(^-)(7r)) BeR^ 

The moment-generating function M{6) of this model is the product AI{9) = 
over the moment-generating functions M^^^ of the variables 
W^^\tt). Hence, the partition function Z{9) of the model factorizes as Z{9) = 
= f]^.M(j)(-0(j)), and the statistics W^^^tt) are independent 
random variables if vr is distributed uniformly. Fligner and Verducci [18] show 
that this independence is preserved if the model p{it\6) is substituted for the 
uniform distribution. The models (7.11) coincide with distance-based models 
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of the form (7.10) whenever the metric d{Tr, ttq), for ttq the neutral permuta- 
tion and vr uniform on S„, is decomposable as a sum of independent random 
variables (i(7r,7ro) = J2j^^''H'^)- The decomposable metrics considered in 
[18] are the Kendall metric and the Cayley metric. We will consider the Cay- 
ley metric dc{T^,Tr') in the following, defined as the minimal number of (not 
necessarily adjacent) transpositions required to transform vr into vr'. For the 
neutral permutation ttq, this metric satisfies ddTTjiTo) = (n — #cycles(7r)). 
Consequently, dc can be decomposed into a sum of statistics which count 
the positions in tt, but discount one element of each cycle. We hence choose 

T^r(i) Trrr o |0 7 smallcst clemeut ou Its cycle | 

(7.12) iy(^):=l-I{%=j} = |^ ^^^^^^.^^ I . 

The definition differs slightly from the one given by Fligner and Verducci 
[18], who discount the largest element on each cycle instead. The smallest 
element is a more adequate choice in the context of nonparametric construc- 
tions, as it is well-defined for infinite cycles. 

Independence of the variables W^^^ is easily verified by constructing a 
uniform random permutation vr by means of n iterations of the Chinese 
Restaurant Process: In step j, element j is inserted into the current permu- 
tation by uniformly sampling U G [j]. If f/ = j, the element is placed on a 
new cycle. Otherwise, it is inserted to the immediate right of element U on 
the respective cycle. The variables W^^^ are hence indeed independent and 
Bernoulli distributed. Since only U = j creates a new cycle, the Bernoulli 
parameters are t^. 

The natural conjugate prior of the generalized Cayley model on S„ is 
given by the conditional measure P^[0n|l'n] with density 

(7.13) pimX,jn) := .exp(5]7(^)^(^) - AlogZ„(^„)) . 

By means of Lemma 3, we can show: 

Lemma 7 (Proof: App. C). (Pn[vrn|en])^ and (P^[en|5^n])^ are 'projec- 
tive families of conditional distributions. 

As projective limits of the two families, we obtain the regular conditional 
probabilities Pd[-'^d|0d] = lim ( P„ [Xn I Qn] ) ^ on the projective limit space 
Xy, = & of virtual permutations, and ^^[601^0] = ^im (-P^[©n|yn])j3 on the 
parameter space = M^. 
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7.2.3. Fullback to Soo- For nonparametric applications, infinite random 
permutations are of particular interest, i.e. observations generated by the 
model should almost surely take values Xjy G Sqo- The model constructed 
above can be guaranteed to concentrate on Sqo by a suitable choice of pull- 
backs. 

Because the pullback model is supposed to concentrate on Sqo , the Borel 
embedding (p should be the canonical inclusion (p : Sqo ^ ©• The form of 
the corresponding embedding Jq- on parameter space is less apparent: We 
observe that an infinite permutation vr is in Sqo if and only if the infinite se- 
quence {W^^'^)j contains only a finite number of ones. For a given parameter 
9^i\ the probability that ^^(^^(vr) = 1 is 

(7.14) Pr{M^O)(vr) = 1} = ^ J ^^^^j"^''^,, =: , 

and we define a mapping q : — )■ (0,1)^ by q{9) := {qj{0j))jeN- By the 
Borel-Cantelli lemma, only a finite number of W^^^ take value one if and 
only if q(9) E ii (cf. proof of Lemma 8). Let jT^j be the canonical inclusion 
ii{0, 1) ^ M^, and define jTr := J^e^ o q. The pullback parameter space is 
then 

(7.15) f := Ho, if = {0e M^l \\q{e)\\,, < oo} . 

A pullback of the model under (j) and Jj- is justified by the following 
concentration result: 

Lemma 8 (Proof: App. C) . With 4> and Jj- defined as above: 

1. The mappings (p and J-j are Borel embeddings. 

2. P*[Soo|© = 61] = 1 if and only if q{p) G £i(0,l). 

3. p'/{f\Y^ = {\-i^)\ = \if^^^f. 

The entire projective limit Bayesian model can therefore be pulled back 
under (p and in the sense of (4.4). The pullback model, given by condition- 
als P[X|€)] and P^[0|y], is parametrized by hyperparameter sequences 7d 
satisfying 11(7(70) lki < oo- The parameter variable almost surely takes val- 
ues Q satisfying ||9(^)||^i < oo, and the observation variable satisfies X G §oo 
almost surely. The sufficient statistics 5*1 = Sn of the finite-dimensional mod- 
els, matching the exponential family representation (6.7), are simply given by 
S^^^tt) = —W^^^tt) . By Theorem 2 and Proposition 2, a sufficient statistic 
S : Soo — ^ {0, 1}^ of the pullback model is therefore given by the countable 
vector ^(Tr) with components S^^\it) = I{kj = j'} — 1. By Theorems 3 and 
4, the model is conjugate. In summary: 
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Corollary 4. The pullbacks P[X\Q] and P^[@\Y] define a projective 
limit Bayesian model with hyperparameter space M>o x T, parameter space 
T, and observation space Sqo- The sequence S with components S^^\tt) = 
I{kj = j} — 1 is a sufficient statistic of the model, and posterior updates 
under observations tt"- = (vr^^^ . . . , tt^")), with tt^^^ S Sqo, are given by 

(7.16) r(")(7r",(A,7)) = (n + A,7 + ^5(^')(7r(j'))) . 

Intuitively, the parameter 9^^^ describes an element-wise concentration. If 
all elements of 9i are negative in the finite-dimensional model, the expected 
value of Pi[Xi|Gi = 9i] is an anti-mode [18]. The larger the value of 9^^\ 
the higher the cost of deviation from the neutral permutation at position 
j. If such a deviation is observed in vr, W^^\it) = 1, and (7.16) describes a 
decrease of the expected concentration at j in the posterior. 

The definition of the sufficient statistics used here closely follows the cus- 
tomary presentation in the rank data literature. Alternatively, the model 
could be expressed (with a different partition function) in terms of suffi- 
cient statistics S^^^tt) = I{kj = j}, which emphasizes the close relation 
of the model to the representation (7.7). Similarly, it may be useful to 
reparametrize the model by i? := q{9), such that concentration on Sqo oc- 
curs for convergent parameter sequences. In its present form, the parameter 
sequence has to diverge instead - concentration on Sqo requires 
eventually, and since the variables W^^^ are independent, this occurs almost 
surely only for diverging concentration parameters. 

8. Discussion. Our results show that conjugate nonparametric Bayesian 
models, when represented as projective limits, reflect much of the structure 
of their parametric counterparts. In particular, their sufficient statistics and 
the updates of posterior parameters are projective limits, and hence the 
precise infinite-dimensional analogues, of the respective functions associated 
with the marginals. 

8.1. Implications for Model Construction. The results suggest a con- 
struction approach for conjugate nonparametric models roughly analogous 
to the parametric case: On a given type of data, define a sufficient statis- 
tic measuring those properties of the data considered important; define the 
corresponding exponential family model and its canonical conjugate prior; 
and extend these to an infinite-dimensional model by means of a projec- 
tive limit and a pullback. In many cases, such constructions may draw on 
existing projective limit constructions from various fields of mathematics. 
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and on well-studied exponential family models to be used as marginals. The 
construction in The main technical hurdles in such a construction are the 
definition of a suitable projective system, and the proof of existence of the 
pullback. The latter step can usually be expected to be the more demanding 
one, since our representation uses the pullback as a convenient general way 
to formalize almost sure properties of the random paths of a stochastic pro- 
cess. This formalization is particularly useful for our purposes, as it allows 
to establish results on sufficiency and conjugacy assuming that the pullback 
exists for a suitable subset of parameters. Actually verifying its existence 
for a given model, however, may involve any of the subtleties of stochastic 
process theory. As examples such as the Dirichlet process demonstrate, there 
is often a compellingly simple intuition as to how a stochastic process model 
behaves, but establishing the mathematical accuracy of this intuition can 
pose technical challenges. 

8.2. Interactions in the Posterior. We close with a heuristic observation 
that may warrant rigorous investigation in the future. If a few exceptional 
cases are neglected for the sake of argument, our results imply roughly speak- 
ing that the class of conjugate nonparametric Bayesian models corresponds 
to those with conjugate exponential family marginals, and hence to those 
admitting a representation of the form (6.9). The posterior updates of the 
marginals are described by sufficient statistics whose image has fixed, finite 
dimension. For the nonparametric model, these updates can be interpreted 
as follows: Suppose the sufficient statistics Si are bivariate, as for example 
in covariance estimation. Censored observations are obtained for index sets 
Ii, I2, ' ' ' ^ D. In the posterior, an observation at Ij can affect all dimensions 
J €z D, through any sufficient statistic Sk with Ij, J ^ K. However, even 
if an infinite number of repeated observations is obtained for one and the 
same Ij £ D, the interactions described by each individual S'k affect only a 
finite subset of posterior dimensions. There may be interesting connections 
here to a family of results known as Pitman- Koopman theory: Under suit- 
able regularity conditions, a parametric model admits a sufficient statistic 
of dimension finitely bounded with respect to sample size if and only if it 
is an exponential family model [e.g. 24]. Similar results have been obtained 
for certain types of Levy processes [33, 34]. In summary, it may be possi- 
ble to characterize conjugate nonparametric Bayesian models as models for 
which the complexity of interactions in the posterior is finitely bounded, in 
a manner which remains to be made precise. 
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APPENDIX A: PROJECTIVE LIMITS AND PULLBACKS 

Both projective limits (inverse limits) and pullbacks are standard tech- 
niques in pure mathematics, and projective limits of probability measures 
are widely used in probability theory. Since neither is a standard topic in 
statistics, though, this appendix provides a brief survey of some relevant 
definitions and results. 

A comprehensive reference on general projective limits is Bourbaki's El- 
ements of mathematics; see Bourbaki [7, 8] for projective limits of spaces 
and functions. Key references on projective limits of measures are Bourbaki 
[9], Rao [48, 49], Mallory and Sion [43], Choksi [11] and Schwartz [52]. On 
pullbacks of measures, cf. Fremlin [19, Vol. I]. Both projective limits and 
pullbacks are common topics in category theory [e.g. 38]. 

A.l. Projective Systems and their Limits. A projective limit as- 
sembles a mathematical object from a system of simpler objects. The assem- 
bled object may be an infinite-dimensional space constructed from finite- 
dimensional subspaces, a group constructed from subgroups, a measure as- 
sembled from its marginals, or a function defined by combining functions 
on subspaces. How the objects are "glued together" is defined by specifying 
a system of mappings, denoted /ji in the following, which connect "larger" 
objects to "smaller" ones. These mappings generalize the notion of a pro- 
jection in a product space. The notion of "larger" and "smaller" is defined 
in terms of a partial order on the set D of object indices. To admit a proper 
definition of a limit, and hence of an extension to infinity, the index set needs 
to be directed. 

Let D he a set and ^ a partial order relation on D. The set is called 
directed if for any two elements I,J£D, there is a K ^ D such that I < K 
and J < K. Let {Xi}i^d be a family of sets indexed by a directed set D. 
Require that for any pair / ^ J in Z), there is a generalized projection 
mapping /ji : — )• Xi, i.e. a mapping satisfying (2.1). Then < 
J G D}, in short is called a projective system. Define a space Xj^, 

as follows: Let {a;i|/ G D} be a collection consisting of a single point each 
from the spaces Xi, for which 

(A.l) Xi = fjiXj whenever I < J . 

Identify any such collection with a point Xd, and let Xj^ be the set of all such 
points. Then Afo is called the projective limit of the system. The functions 
/i : Xj^ — )• Xi defined by Xd i— >• Xi are called canonical mappings. 

The projective limit X^^ is a subset of the product space H/eD '^i- W^e write 
prj for the canonical projection prj : H/eD Xi ^ Xi. The canonical mappings 
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are just the restrictions /i = prjl;^:'^ of the projections to the projective hmit 
space. The product space may be interpreted as the set of ah functions x 
with domain D that take values x{I) G Xi. Consequently, the projective 
limit space is precisely the subset of those functions which commute with 
the mappings /ji, in the sense that x{I) = (fji o x)(J) whenever I ^ J. 

If the spaces Xi are endowed with additional structure, and if the canon- 
ical mappings /ji are chosen to preserve this structure under preimages, a 
corresponding structure is induced on the projective limit space. Two exam- 
ples relevant in the following are topological and measurable spaces. Suppose 
that each space Xi carries a topology Topj and a cr-algebra Bi. The system 
(Afj, Topj, /ji)^ is called a projective system of topological spaces if each fji 
is Topj-Topj-continuous. The projective limit topology Top^ is defined as 
Topjj := Top(/i;I E D), the coarsest topology which makes all canonical 
mappings fi continuous. Analogously, (Xi,Bi, fji)^ is projective system of 
measurable spaces if the fji are measurable, and Bu '■= (y{fi',I E D) is called 
the projective limit a-algehra. If the cr-algebras are the Borel sets generated 
by the topologies Topj, then By, = cj(ToPq). The general theme is that the 
mappings /ji are chosen to be compatible with the structure defined on the 
spaces Xi^ and the projective limit structure is the one generated by the 
canonical maps /i. In a similar manner, projective limits can be defined for 
a range of other structures, such as groups (with homomorphisms /ji), etc. 

Suppose now that two families of spaces (^^1)^ and {yi}^ are jointly in- 
dexed by the same directed set D, and connected by a family (wj) of 
mappings. If the mappings commute with the projection maps, they define 
a projective limit mapping between the respective projective limit spaces. 

Lemma 9 (Projective limits of functions [8, III. 7. 2]). LetV^ := (Xj, fji)^ 
and T>y := {yi,gji) be two projective systems with a common index set D. 
For each I & D, let Wj : Xj ^ yj. Require that the mappings satisfy 

(A. 2) gjiOWj = Wjofjj. 

Then there exists a unique mapping Wo '■ X^, — s- ys such that giow^, = W[0 fj 
for all I. In other words, the diagram on the right below commutes if and 
only if the diagram on the left commutes for all I ^ J ^ D: 

Xj yj Xo yE 

fji 9ji fi 9i 

Xr yj Xj y, 
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A number of useful properties of mappings are preserved under projective 
limits [7, 8]. If each Wi is injective or bijective, then so is Wq. Projective 
systems P^', 2?^ of topological spaces preserve continuity, i.e. Wu is Top^- 
Topo-continuous if and only if each Wi is continuous. Projective systems of 
measurable spaces preserve measurability (Lemma 1); projective systems of 
algebraic structures preserve homomorphy, etc. A notable exception is that 
Wu need not be surjective, even if all Wi are. 

In a similar manner, projective limits can be defined for set functions, 
and in particular for probability measures Pi. The domains Xi of the maps 
Wi above are replaced by the cr-algebras Bi, and the ranges yi by [0, 1]. We 
denote by fji{P]) the image measure under projection, i.e. /ji(-Pj) = Pjofj^^. 

Theorem 4 (Kolmogorov; Bochner [6]). Let {Xj,Bi, fji) ^ be a projective 
system of Polish measurable spaces with countable index set D, and (Pj) a 
family of probability measures on these spaces. If the measures commute with 
projection, that is if f.ji{Pj) = Pi whenever I < J , there exists a uniquely 
defined probability measure Po on the projective limit space {XoiBd) such 
that fiiPo) = Pi for all I € D. 

The image measure is referred to as a marginal of Pj, and when- 

ever Xi C Xj is exactly the subspace marginal of Pj on Xi. The theorem 
generalizes to the case of uncountable index sets, but then requires addi- 
tional conditions to ensure X^y ^ 0. The most commonly used condition 
is Bochner's "sequential maximality" [6]. Kolmogorov originally proved the 
theorem for product spaces, for which sequential maximality is automati- 
cally satisfied. 

A. 2. Fullbacks of Measures and Functions. Projective limit con- 
structions of stochastic processes raise two problems: One is the effective 
restriction to countable index sets. The other is that a construction from 
finite-dimensional marginals can only express properties of the constructed 
random functions that are verifiable at finite subsets of points (such as non- 
negativity), but not infinitary properties (such as continuity, or countable 
additivity of set functions). Both problems can be addressed simultaneously 
by means of a pullback, defined via the following existence result. 

Lemma 10 (Pullback measure [19, Section 132G]). Let X be a set, 
{yjBy,^) be a measure space and J : X ^ y any function. If J{X) has 
full outer measure under v, that is if v*{J'X) = i'{y), there is a uniquely 
defined measure u on {X,J-^By) such that 

(A.3) i)oJ-^ = i,. 
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The measure fi defined by (A. 3) is called the puUback of v under J. If the 
pullback exists, can be represented as the image measure i' = The 
outer measure condition i'*{J'(X)) = iy{y) ensures that the definition of u 
by means of the assignment i>{J'~^A) := I'iA) is unambiguous: li A,B ^ By 
are two sets, J^~^A = J~^B does not imply A = B. Hence, v may assign 
different measures to A and B, in which case it is not possible to assign 
a consistent value to J~^A = J^^B under the pullback. However, this 
problem does not occur on the image J{X), since J'^^A = J~^B does 
imply {A^B)f^ JX = 0. Thus, if v*[JX) = i^{y), any differences between 
A and B are consistently assigned measure zero. 

The arguably most important application of puUbacks of measures is the 
restriction of a measure to a non-measurable subspace: Let A" C 3^ be an 
arbitrary subspace, and v a measure on y. If the subspace has full outer 
measure i'*{X) = u{y), the measure u has a uniquely defined pullback 9 
under the canonical inclusion map X ^ y. The measure v lives on the the 
measurable space {X ,Byr\X), and assigns measure ^{AnX) = i^{A) to each 
intersection of a measurable set A E By with X. Hence, D can be regarded 
as the restriction of to Af. 

As for measures, pullbacks can be defined for functions. Let : X ^ X 
and : 3^ — >• 3^ be two functions. A pullback of a function f : X ^ y is 
any function f : X ^ y for which the following diagram commutes: 



Conversely, if / is given, any function / for which the diagram commutes is 
called a pushforward of /. 

The definitions of pullbacks for measures and functions are compatible, 
in the sense that the simultaneous pullback of a measure and an integrable 
function under the same mapping preserves the integral: Let y = y = M, 
and let (X, C, i^) be a measure space such that J^x^ has full outer measure 
'^*{'Jx^) = i^{X). Let / be C-measurable, non-negative and z^-integrable. 
Then / is ^7^^ C-measurable and P-integrable. Since u is the image measure 
of u under Jx, 



X y 



Jx 



Jy 



X 



y 



f 



(A.4) 
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APPENDIX B: PROOF OF THEOREM 3 

Proof of (1). Let ki be the probability kernel corresponding to (t/"^)„ 
in (6.1). Since the kernels ki : Bi x Wi — )• [0, 1] live on different spaces Wi, 
they do not themselves form a projective family. To construct projective 
kernels k[, let T^"^ := ^im (t/"'*)^ be the projective limit of the posterior 
indices for each n E N. Let Wo '■= lini(Wi, ^ji)^- Denote by 7^ C Wo the 
set of possible values of {T^^)n, 

(B.l) 7^:=u„^(")(^^,3^„). 

For any Wo & Tl and Aj G Bi, define k[{Ai,'Wo) '■= ki{Ai, hiWo)- The func- 
tions so defined form a family of kernels Bi x TZ ^ [0)1]- This family is 
projective: Let G and i/d £ 3^d- Since the posterior indices are pro- 
jective, 

k,{g-'Auh,T^''\xi,yo)) =...P^[g-'A\X^ = = h,yo] 

(B.2) =....Pj^[A|Xr = f^, Y, = h,yo] 

—a.c.ki{Ai,hiTo ^{x-^,yo)) • 

Hence, kj{gJj^^Ai,Wo) = k[{Ai,'Wo) for any Wo G Tl, and {k[)^ is a projective 
family. Let fc^ := ^m {k[)^ be the projective limit, as guaranteed by Theorem 
1. Regarded as functions on Q, the kernels k[ satisfy 

(B.3) A::(A,T(")(xrM,y:H) [^:|xr,y:]H . 

Since iX^jYo] = ]^{Pi[ - 1-'^") ^1])^) uniqueness up to equivalence in 
Theorem 1 implies 

(B.4) k'^iA,T^''\x^,yo)) p!^[A\X^ = xl,Yo = yo] ■ 

Therefore, {T'^^)n is a posterior index of the projective limit model with 
corresponding kernel k'^. 

Since the posterior index {T^^)n consists of projective limits of measur- 
able mappings, conjugacy of all marginals implies T^\x^ x C 3^d for 
any n G N, and hence conjugacy of the projective limit model. □ 

For part (2), the existence of a posterior index for each marginal is estab- 
lished by means of the following lemma. 
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Lemma 11. Let Jx : X ^ X and Jy :y ^ y be continuous functions 
between Polish spaces. Suppose that a measurable function f : X ^ y is 
given. If Jx is open or closed, there exists a pushforward of f which is 
measurable. 

The proof of the lemma draws on the concept of a selector [e.g. 29]. For 
a given correspondence (equivalence relation) on a product set A x B, a 
selector is function (5 : A ^ B with /(a) E Rio.), i.e. an assignment which 
transforms the set-valued map a i->- R{a) into a function by selecting a single 
element of the set R{a) for each a. In our case, the correspondence of interest 
is the preimage Jx^ ■ ^ selector can be constructed for any correspondence 
by invoking the axiom of choice, but will in general be too complicated to 
be of any use. Under additional regularity conditions on the correspondence 
and the underlying spaces, the selection theorem of Kuratowski and Ryll- 
Nardzewski [29] guarantees the existence of a Borel-measurable selector. 

Proof of Lemma 11. By the selection theorem [29, Theorem 12.16], 
a correspondence between Polish spaces admits a measurable selector if it 
is weakly measurable and its values are closed non-empty sets. We have 
to show that satisfies these conditions. The upper inverse under the 
correspondence of a set ^ C ^ is by definition {x G XlJ^^x C A}, 
which in this case is just JxiA). If Jx is open, the upper inverse Jxi^) of 
any open set A C is in B{X), which makes the correspondence weakly 
measurable. Similarly, if Jx is closed, the upper inverse of any closed set 
is in B{X), hence J^^ is measurable, and in particular weakly measurable 
since X is Polish. The singletons are closed, hence by continuity, Jx^^ is 
closed, and as a preimage non-empty. We note that the analogous result for 
pullbacks instead of pushforwards follows mutatis mutandis. □ 

Proof of (2). Let be the kernel corresponding to the posterior index 
{T'^^)n which makes the projective limit model conjugate. The marginals 
form a projective system. Hence for any I £ D and Ai G B{Ti), 

(B.5) k^{g-'A,h,Ti''\x'^,y^)) [A|l^i = h^Ti''\x^,y^)] 

is a valid version of the posterior Pf[Ai\X^ = fpx^,Yj = /iiyo]- Since the 
mappings are surjective, any hyperparameter yi and sample is repre- 
sentable in this form, and the marginal model is closed under sampling since 
hiT^\x^,yYi) G yi- By the same identity, any measurable mappings (t/"^)„ 
satisfying (6.4) form a posterior index of the marginal model. A mapping 
t/""^ satisfies (6.4) if it is a pushforward of If the canonical mappings 
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are open or closed, the existence of such measurable mappings Tj follows 
from Lemma 11. □ 

APPENDIX C: PROOFS FOR SECTION 7 

Proof of Lemma 5. Since V is Polish, its topology is metrizable by 
some metric d. The space is separable, and hence has a dense, countable 
subset U C V. Let U be the set of closed d-balls of the form 

(C.l) U := {B{v,r)\v eU,r eQ+} , 

and let Q = QiU) be the smallest algebra containing ZY. Since W is countable, 
so is Q(U). The statement of the Lemma then follows from [45, Theorem 1], 
which states that P^'* [M {Q{U))\Yo = (a, Go)] = 1 holds if and only if Go 
is countably additive on QiU). □ 

Proof of Lemma 6. (1) cp is a mapping: We have to argue that, when- 
ever the set C y is a singleton, there is exactly one Xo G with 
limxn = W. For any Xd = {Ci\I £ D} £ Af^, by definition, limxQ C Gi 
for all /. Hence, every partition I £ D contains exactly one set Ai with 

V £ Ai. (Note that no such set need exist if W is not a singleton.) Therefore, 
Xd := G D} is the only element of Xj^ satisfying limxn = {v} = W. 

(f) is measurable: Since the cr-algebra on Xj^ is the projective limit Bj^, (j) is 
measurable if and only if each of the mappings /i o is measurable. For any 

V £V , the image (/i o (t)){v) = Ai is the unique set Ai £ I for which v £ A^. 
The preimage of € / is therefore simply 

(C.2) (/i o 0)-HA} = {v£ V\v £ A{] = A, , 

and measurable since Ai £ Q C By- 

As a mapping onto its image, 4> has a measurable inverse: By definition, cp 
is trivially injective. For measurability of the inverse on (p{V), we have to 
show (p{A) £ BqCi (j)(y) for every A £ By, or equivalently, for every A £ Q. 
For any A £ Q, there is some I £ D with A £ I, and hence {A} £ Bi. The 
singleton {A} is the base of the cylinder f~^{A} = {xd| limxo C A} £ Bo- 
Then (l){A) = /jTU^} n (p{V) and hence (p{A) G n (p{V). 

(2) Let 6u be purely atomic of the form 0d = J2i£N'^i^Vi- We will show 

(C.3) p^{{xo}\Q^ = 9^) = a ^ limxu = {v^}. 

The right-hand side is in turn equivalent to x-^ = (t){vi). Given that (C.3) 
holds, the proof is complete: Since {vi\ C V for all i G N, (C.3) implies 

(C.4) 1 = Y,P^[{4>{v^)\Qo = e^] < P*[V\Qo = 0d] < 1 . 
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To verify (C.3), first observe that for any v £ V, there is a decreasing 
sequence of sets Qn G Q with hmQn = {v}. To see this, recall the definition 
of Q: The algebra is generated by compact balls centered at the points in the 
subset U C V. Since U is dense, there is a sequence Un & U with limti„ = v 
and d{un,v) < i^. Set Qn '■= B{un,i^)- Hence, v S Qn for all n, and 
V G limn„. On the other hand, B{un, 5^) C B{v,^), and as the balls are 
compact, B{v, ^) \ {v}. 

Given such a sequence {Qn)n, there is a sequence /i ^ /2 ^ • • • of parti- 
tions in D such that Qn G for all n. In the representation = {Ci\I £ 
D}, we therefore have Ci„ = Qn- For Xd = 4>{vi)., 

Pu[{x^}\Qu = 0^] = lim Pi„[/i;^Qn|ei„ = <7i„^d] = hm e^{Qn) = Q . 

□ 

Proof of Lemma 7. To show that both (P„[7r„|G„])j^ and (P^[0„|y„])^ 
are projective families of conditional distributions, we appeal to Lemma 3. 
First consider the models -PnfTTnlOn]. For vTn = <7fc^(l) • • • o"fc^(n), the preim- 
age fn+i^n'^n consists of the permutations 7r„+i = au^ (1) • • • crk^{n)am{n + l) 
for m = 1, . . . ,n + 1. For the sampling distributions, fix 9n+i G Tn+i, and 
let On = Wn+i,n^n+i- Then 

-1 (Em-i e"^^"^'') + 1 

(C 5) = ^"+1] =^n[^n|0ri = On] ™^ ^^_g(„+i) 

=Pn[7rn|9„ = 6'„] 

Lemma 3 requires a product space structure of the sample space and is thus 
not directly applicable on the groups S„. However, the encodings map 
into a product space, and we may equivalently consider the image measures 
V'ra(-Pn) on nm<n["^]- (C-5), the image measures under -(/'n satisfy 

(C.6) Pr„+i,„ O tpn+l{Pn+l[-\Qn+l = On+l]) = 1pn{Pn[ A^n = On]) 

which establishes (3.9). By Lemma 3, the images form a projective family of 
conditional probabilities under the projections Wn+i ni ^-^d hence by (7.8), 
so do Pnknl©™] under /„+!,„• 

For the priors, which are defined on the product spaces M""^, Lemma 3 
can be applied directly. Since Zn = Yij Z^^\ the partition function Kn fac- 

torizesasiCn(A,7n) = Hj K^^\X,^^^^)- The projection (pr„+i^„P^+i)[6„|yn] 
therefore has density 

fPn{en\K,n)e } -dO^^^^^ = /MX, Jn) , 
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which estabUshes (3.9). Hence, P^[0n|A,7„] is a projective family of condi- 
tionals by Lemma 3. □ 

Proof of Lemma 8. (1) As a canonical inclusion, cp is an embedding 
and hence a Borel embedding. Regarding Jr, first note that the mapping 
q : — )• (0, 1)^ is injective and continuous, hence measurable. Its image 
£i(0, 1) is a subset of the Polish space (0, 1)^, and since convergence of a 
sequence in (0,1)^ is a measurable event in the tail a-algebra, £i(0,l) is 
Borel and hence itself Polish. As a mapping onto its image, q is surjective, 
and as a measurable bijection between Polish spaces, it has a measurable 
inverse. Since JT^^ is again a canonical inclusion, the composition Jj- = Jn^oq 
is a Borel embedding. 

(2) A virtual permutation vr is an element of Sqo if and only if W^-^^ (vr) < 

00. If this is the case, all but a finite number of entries of vr form their 
own cycle, and hence tt G Soo- If the sum diverges, at least one cyclic set 
contains an infinite number of elements. The random variables W^^"^ (tt) are 
independent under the model. Hence, by the Borel-Cantelli lemma, the sum 
converges if and only if the sum of probabilities Pr{iy(^)(7r) = 1} converges, 

1. e. if q{6) G ^i, and hence i( 9 £ T- 

(3) The random variables are independent given the hyperparameters. 
By the zero-one law, the event {Qu £ T} = {^(©d) £ h}, i-e. the event 
that the random sequence {&u^)j diverges, has probability either zero or 
one. The variables have expectation E[Bd^] = 7d \ Each component of q by 
definition satisfies qj{t) — )• if t — )• +oo. Hence, 7d G 7" = q^^i^i) implies 

— )• oo as j — )• oo. Thus for any e > 0, the expectations satisfy E[0[5'^] > e 
for a cofinite number of indices j, and Pr{0D G T} = 1. □ 
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